Seven Years of Spaced Repetition Software in the Classroom

post by tanagrabeast · 2021-03-04T02:42:01.475Z · LW · GW · 37 comments

Contents

    
    (spoilers)
    and response to Piotr Woźniak
    
  Part 1: Everybody Poops
    great and terrible vision
    defense of forgetting
    defense of remembering
    value proposition of classroom SRS
    I currently use classroom SRS
    apathy, toward apprenticeship
  Part 2: A Third Year of Spaced Repetition Software in the Classroom (2017)
    and disclosure
    
    
    of friction
    with multiple-choice study cards
    failure modes
    the blame game
    and discussion
      Table
      Study time analysis
      SRS signal, or just conscientiousness?
      People persons
      High automaticity in high achievers
      Weak transference?
      Drama benchmark analysis
    (2017)
  Thesis, Antithesis, Synthesis (2021)
None
37 comments

Description

This is a reflective essay and report on my experiences using Spaced Repetition Software (SRS) in an American high school classroom. It follows my 2015 [LW · GW] and 2016 [LW · GW] posts on the same topic.

Because I value concise summaries in non-fiction, I provide one immediately below. However, I also believe in the power of narrative, in carefully unfolding a story so as to maximize reader engagement and impact. As I have applied such narrative considerations in writing this post, I consider the following summary to be a spoiler.

I’ll let you decide what to do with that information.

Summary (spoilers)

My earlier push for classroom SRS solutions was driven by a belief I came to see as fallacious: that forgetting is the undoing of learning. This epistemic shift drove me to abandon designs for a custom app that would have integrated whole-class and individual SRS functions.

While I still see value in classroom use of Spaced Repetition Software, especially in basic language acquisition, I have greatly reduced its use in my own classes.

In my third year of experiments (2016-17), I used a windfall of classroom computers to give students supervised time to independently study using an SRS app with individual profiles. I found longer-term average performance to be slightly worse than under the whole-class group study model, though students of high intelligence and motivation saw slight improvements.

Intro and response to Piotr Woźniak

I have recently received a number of requests to revisit the topic of classroom SRS after years of silence on the subject. Understandably, the term “postmortem” has come up more than once. Did I hit a dead end? Do I still use it?

Also, I was informed that SRS founding father Piotr Woźniak recently added a page to his SuperMemo wiki in which he quoted me at length and claimed that SRS doesn’t belong in the classroom.

Well, I don’t have much in the way of rebuttal, because Woźniak’s main goal with the page seems to be to use my experience as ammunition against the perpetuation of school-as-we-know-it, which seems like a worthy crusade. He introduces my earlier classroom SRS posts by saying, “This teacher could write the same articles with the same conclusions. Only the terminology would differ.” I’ll take that as high praise.

If I were to quibble, it would be with the part shortly after this, where he says:

The entire analysis is made with an important assumption: "school is good, school is inevitable, and school is here to stay, so we better learn to live with it".

Inevitable? Maybe. Here to stay? Realistically, yes. But good? At best, I might describe our educational system as an “inadequate equilibrium [? · GW]”. At worst? A pit so deep we still don’t know what’s at the bottom, except that it eats souls.

Other than that, let me reiterate my long-running agreement with Woźniak that SRS is best when used by a self-motivated individual, and that my classroom antics are an ugly hack around the fact that self-motivation is a rare element this deep in the mines.

Anyone who can show us a way out will have my attention. In the meantime, I’ll do my best to keep a light on.

Prologue

At the end of my 2016 post, I teased a peek at a classroom SRS+ app I was preparing to build. It would have married whole-class and individual study functions with some other clever features to reduce teacher workload.

I had a 10k word document in hand: a mix of rationale, feature descriptions, and hypothetical “user stories”. I wasn’t looking for funding or a co-founder, just some technical suggestions and moral support. I would have been my own first user, and I had to keep my day job for that anyway.

But each time I read my draft, I had this growing, sickening sense that I was lying to myself and my potential customers, like a door-to-door missionary choking back a tide of latent atheism. And I should know, because the last time I had felt this kind of queasiness I was a door-to-door missionary choking back a tide of latent atheism.

I thought maybe this was just the kind of general self-doubt common to anyone undertaking something audacious, but I paused my work on it for another school year while I tried the obvious thing: providing students individual SRS app profiles and supervised class time in which to use them.

This is a two-part essay, and in Part 2, I’ll tell you how that went. But in Part 1, I’m going to make the case that Part 2 doesn’t matter very much.

Part 1: Everybody Poops

A great and terrible vision

As I wrapped up my Third Year experiment, I again tried to sort out my feelings about my visionary SRS app design, which I hadn’t updated despite a year of fresh experience. Was it just self-doubt?

The fact that I could only code at a minimal hobbyist level didn’t feel like the biggest hurdle. I think I could have picked up enough skill in that area. But even with a magical ability to translate my vision into code, I would have been up against a daunting base rate of failure for education startups. Also, I didn’t consider myself a very typical teacher: What sounded brilliant and intuitive to me would probably seem pointless and nonsensical to 95% of my peers.

Still, I pulled out my Eye of Agamotto and checked out all of the futures where I developed the app. In almost all of these, nothing came of it. But in the few where my app saw high adoption, the result was… dystopia! Students turned against their teachers, and teachers against their students. Homework stretched to eternity. Millions of children cursed my name. The ‘me’ in these futures wore an ignominious goatee and a haunted stare. 

Used judiciously for the right concepts, in the right courses, by the right teachers, I still think my imagined app could be a powerful tool. But I don’t see any way to keep it from being abused. Well-intentioned teachers would put too much into it and demand too much from students. Any safeguards I put in to prevent this would just invite my app to be outcompeted by an imitator who removed these safeguards (which would seem arbitrary and restricting to most users).

I’m convinced of this because the me who wrote the original “A Year of Spaced Repetition…” post would have abused it. Let’s see... He was averaging seven [LW · GW] new cards a day? (That’s 2-3 times what I would recommend today.)  He uncapped [LW · GW] the 20 new card/day limit? He knew even then that he was adding too many cards, but failed to cut back the following year [LW · GW]? I’m not encouraged.

“But wait,” you say. “You didn’t think you were a typical teacher. Maybe a typical teacher could be trusted?”

No.

In defense of forgetting

The “problem” is that teachers instinctively introduce far more content than students can be expected to remember. This was obvious to me when I was averaging seven new cards a day, which still felt like a brutal triage of my total content.

Covering more material than can be retained isn’t bad teaching, though. In fact, it’s a good and necessary practice. Content — the more the merrier — is the training data the brain uses to form and refine mental models of the universe.[1] [LW(p) · GW(p)] These models tend to be long-lived, and allow the brain to re-learn the content more deeply and efficiently if it ever comes up again. They also allow it to absorb new-but-conceptually-adjacent contents more readily. In cognition, as in nutrition, you are what you eat — and good digestion naturally produces solid waste. The original training data is subject to lossy compression, with only a few random fragments left whole and unforgotten. (Tippecanoe, and Tyler Too! The mitochondria is the powerhouse of the cell!) Such recollections are corn kernels bobbing top-side up in a turd floating down the river Lethe. 

This is normal and fine. Regular, even.

But the educational establishment doesn’t see it that way. The teacher I was seven years ago didn’t see it that way. And I now realize that the teacher I was five and six years ago had queasy feelings because he was starting to see it that way. Following my gut, without fully understanding or even entirely registering what I was doing, I slowly turned around and started walking the other way, abandoning my app design and the unfinished “Third Year” report.

The orthodox view equates forgetting with failure. It’s not “Everybody poops”. It’s “Poop is inadequate. How can we get more corn, less poop?” This belief is implicit whenever someone [LW · GW] laments the “summer slide” , or opines that students missing school during the Covid pandemic are “losing” months of learning — as if kids are spinning their progress meters backwards, just pooping away without anyone trying to stop them. Under this view, we keep kids in school partly to stop the leaks, and partly to stuff them with new knowledge faster than they can expunge old knowledge.

If this is how you see education, SRS is a tool to keep students from pooping. It offers the tantalizing possibility of learning without forgetting. Two steps forward, no steps back. Why wouldn’t you push it as hard as possible?

Don’t get me wrong. All else being equal, learning without forgetting would be great. But the most important effects of learning — lasting changes to our mental machinery — happen whether or not we forget the content. Once the lesson is over, dear teacher, your best shot at lasting growth has already left the harbor. So why are you still trying to hold back the tide? Why are you planning to punish your students for pooping on Tuesday, the day before your test, instead of Thursday, the day after it?[2] [LW(p) · GW(p)]

In defense of remembering

This is not a “How I Learned to Stop Worrying and Love Forgetting” essay. I don’t love forgetting. I will be the first to argue the merits of not forgetting right away.  The longer we can keep ideas floating around in our heads, the greater their “cross-section”, as I put it in 2016 [LW · GW], with more opportunities to make associative connections that cause useful long-lived updates to our mental models.

Unfortunately, I have not found SRS to be great at fostering the sorts of reflective mental states conducive to insight, except when studying on my own at a deliberately slow pace, as while on a walk. In such a use case, SRS no longer has quite the time-efficiency advantage that is its main selling point. The opportunity cost of using it goes up. In a whole-class SRS session, long reflective pauses between cards would invite frustration and misbehavior, and we wouldn’t get through very many cards.

In defense of remembering, I will also argue that some skills are simply impossible without a continuous retention of specific dependencies. These skills tend to be technical. Heck, this might be the definition of a technical skill.

With a few mostly upper-level exceptions, though — math, physics, chemistry — most of what we teach in school is more conceptual than technical. We make you take history so you have a better model of how civilizations and governments work, not so you remember who shot Alexander Hamilton. We make you take English to improve your word-based input and output abilities, not so you remember the difference between simile and metaphor. At least, I hope we do.[3] [LW(p) · GW(p)]

Besides, even in the technical classes, forgetting is the near-universal outcome, and the long-term benefits are mostly conceptual — for if you don’t use these skills continuously for the rest of your life, you’re almost certainly going to lose them. Maybe more than once.

I’ve forgotten algebra twice. I’ve forgotten how to write code at least three times. I can’t do either one at the moment. But I’m still changed by having known them. I have an intuition for what sorts of problems ought to be mathematically solvable. I can think in terms of algorithms. And I could relearn either skill more easily than on the first or second occasions. Also, relearning has an anecdotal tendency to deepen understanding in a way that continuous retention may not, especially when approached from a different direction.

Still, as long as I’m defending retention, I think it’s valid to ask whether we should force kids (and often, by extension, their parents) to relearn math every frickin’ year. Consider: The conventional wisdom is that technical companies begrudgingly expect to have to (re)train most new workers in the very specific areas they need. They look to your resume and transcripts mostly for evidence that you have learned technical skills before and can presumably learn them again. I don’t think they care if you’ve re-learned them three times already instead of six. So, if we’re going to force kids to demonstrate intermediate math chops to graduate (a dubious demand), perhaps we could at least wait until the last practical moment, and then do it in bigger continuous lumps — like two-hour daily block classes starting in grade 9 or 10 — so they would have fewer opportunities to forget as they climb the dependency pyramid. Think of the tears we could save (or at least postpone).

The value proposition of classroom SRS

Anyway, classroom SRS has its strengths, but midwifing conceptual insights doesn’t feel like one of them. I think it’s also reasonable to assume that students forget almost everything from a classroom SRS deck as soon as they stop using it.

Adjusting for these two assumptions, the terrain where classroom SRS can beat out its opportunity costs dramatically shrinks. But I believe it still exists, at the intersection of high automaticity targets and medium-term objectives.

With high automaticity targets [LW · GW], what you’re trying to train is a reflexive response to a stimulus that is going to look a lot like the study card. Foreign language vocabulary is my poster child for this. You’re not drilling the words to unearth insights. You’re drilling for speed, so that they can keep up when a word pops up in a real-time conversation.

You’re also trying to drill away the need for conscious awareness. You want that front-side combination of sounds or letters to cause the back-side set of sounds or letters to pop automatically into their heads. This is my intent when I drill my English students in word fragments (prefixes, roots, suffixes), which are really just bits of foreign language (Greek, Latin). If it’s not automatic, then they’ll gloss right over the possible meaning of “salubrious”, even though they have learned that “salu” usually means “health”.

By medium-term objective, I mean “I want my students to have automatic fluency with the content of these cards on Day X”, where X is a date between one week and three months in the future. It shouldn’t be sooner than that, in accordance with Gwern’s “5 and 5” rule: You probably need at least five days to get any real advantage from SRS. And it shouldn’t be later than a few months, for two reasons: First, we’re assuming the students will forget it all once they stop studying, which is all but guaranteed after the end of the course; there’s little point in keeping those cards in rotation after Day X. Second, I probably don’t want to start those cards until the last practical minute, which is unlikely to be more than three months ahead of time.

Why three months and not six? It’s not a hard-and-fast rule, but from the experience of my first three years of classroom SRS, if you’re trying to retain things for more than a few months, the total number of cards is likely to become greater than you can productively study every day, and many cards will languish unseen. Plus, your roster can change, especially over a semester break. The set of students you have in six months might only have 70% overlap with the set you have now. Really, you should wait until the last practical minute.

But what constitutes a worthy “Day X”? It might be a test. But if it’s your test, you may not have been listening. Your test may just be arbitrarily punishing some kids for forgetting a little sooner than others. However, if it’s an external test, with high stakes for you and your students, then it could be a worthy Day X indeed. For me, Day X is the day of the big state test — the one used to compare students to students, teachers to teachers, and schools to schools.

When your students do well on an external test, though, please keep a healthy perspective. A high test score doesn’t mean they can do the hard things now and forever. It means they were able to earn a high test score on Day X. They will forget almost all of it afterwards. But you will have given them their best chance to signal to others that they can learn hard things, and that you can teach them hard things, and that your school has teachers who can teach hard things.

Day X doesn’t have to be a test. If you’re optimizing for brain change that persists after they forget all of your content, Day X could be an immersive event. Maybe your Spanish class is going to Madrid. You know they will have a deeper experience if you can bring their vocabulary to a peak of richness and automaticity on the eve of departure. Yes, they’ll still forget almost all the words later. But they might retain a glimpse of how the world looked when seen through another language.

Maybe your event is smaller. A virtual trip. An in-class conversation day where we pretend we’re at the beach (“¡En la Playa!”). Maybe their long-term takeaway will be an appreciation for how different languages use different grammars, which is not something most people even consider until they’ve studied a second language. Get their mental gears turning hard enough, and they might even see grammar as an arbitrary construct with tunable parameters and tradeoffs that influence what can be communicated easily. Maybe they’ll independently rediscover the Sapir-Whorf Hypothesis. But they’re not going to remember how to say ‘sand’. Nope. ‘Shark’, maybe (¡Tiburón!). But you can’t predict this, and it’s probably not worth the effort to try. 

But maybe you’re not teaching a foreign language. No matter your subject, Day X could be any conceptually demanding lesson or unit that is difficult to even talk about without fluency in a given set of terms. These aren’t very common in 10th Grade English, though they come up more often in my Creative Writing class. In these cases, however, the dependent terms are conceptually rich enough that they don’t lend themselves very well to cards, and I find it’s better to just quickly re-teach them in front of the lessons that use them. “Remember how we said...”[4] [LW(p) · GW(p)] 

How I currently use classroom SRS

As you may have guessed, I’ve radically scaled back my usage of classroom SRS since those first three years. In fact, for the last four years, I’ve only used it during a two-to-three month span leading up to the state test. And for the last two of those years, I’ve only used it for word fragments. I’m very unlikely to abandon its use for word fragments, though, because the most important thing I teach my students by using SRS is the existence of SRS [LW · GW]. Word fragments are my favorite way to demonstrate how efficient study time can be. I add no more than about ten cards per week, which means that most days’ study takes less than two minutes. (This is good, because my own enthusiasm now begins to flag by the two minute mark.) I give very short quizzes on the fragments so they can do well on them and see how a little study can have a big payoff. (Remember that most of my students don’t ever study on their own.)

I’m still using Anki, with different profiles for each class. I run the review in a call-and-response style, where I show and say the card, and they know to simply shout out the answer. On a good day, it becomes a kind of chant. The number, speed, accuracy, and confidence of the responding voices tells me which button to press, and there’s usually a bellwether student I can listen for as I make my decision. Because I’m striving for very high automaticity, I almost always press either 2 (the shortest affirmative next-study delay) or 1 (the negative start-it-from-scratch button).

My students mostly like the call-and-response flow, as archaic as it sounds, and I will refer you to an older footnote [LW(p) · GW(p)] about that time I observed a traditional one-room Mennonite schoolhouse:

I once had the privilege of observing part of a lesson in a traditional Mennonite one-room schoolhouse. I don't speak a word of Low German, but it was clear the kids knew whatever it was they were drilling as they stood up and recited together. Most striking was the fact that they were all on the same page. There were no stragglers spacing out, slumped over, dozing off. The teacher could confidently build up to whatever came next without fear of leaving anyone behind.

For at least a minute or two every day, even worldly American kids can enjoy the routine. As I put it elsewhere in that Second Year report, “They enjoy the validation they get with each chance to confirm that they remember something. They enjoy going with the flow of a whole class doing the same thing. They enjoy the respite of learning on rails for a change, without any expectation that they take initiative or parse instructions.”

It probably goes without saying, but this call-and-response format only works well with cards with a very short answer that can be recalled very quickly. This is why I now only use SRS for word fragments. If I taught a foreign language, or even a lower-grade reading class with more basic vocab words, I would be using it more. My wife taught high school Spanish for a number of years, experimented with SRS, and is on the record as saying Duolingo deserves to eat the world. Anyone she could get to use it independently didn’t really need her class to do well on the final assessments.

After the state test, my students will forget almost all of their word fragments. That is the way of things. Ashes to ashes, circle of life, or, to get back to my controlling analogy, “All drains lead to the ocean, kid.” What I’m hoping will remain is an updated appreciation for what a little regular study can do, and a vague recollection that there are these apps out there that are, you know, like smart flash cards, that make it fast to memorize stuff.

Against apathy, toward apprenticeship

I’m nearing the end of Part 1, which means I’m nearing the end of my labors on this post, since Part 2 was mostly written five years ago. As writing projects go, I have found this one extraordinarily difficult. Over the course of its creation, I have pooped five times. It wants to be a book (or at least a blog), as everything I say tries to come out as a chapter of explanation having little to do with SRS.[5] [LW(p) · GW(p)]

Well, I’m now going to indulge in several paragraphs where I don’t tie it back to SRS, so I can tell you the story of how I reinvented myself after my third year of spaced repetition software in the classroom. This included moving to a new school where I would have greater freedom to pursue my evolving views about learning. For what it’s worth, this story at least starts with SRS.

You see, it was during those dangerously long classroom Anki sessions six and seven years ago that I honed my sensitivity to students’ moods, to my own mood, and to how these feed off of each other. Sustaining a session without losing the room was like magnetically confining hot deuterium plasma — dicey, volatile, but occasionally, mysteriously, over unity.[6] [LW(p) · GW(p)] I came to view anti-apathetic moods as a kind of energy that can be harnessed to do work and to create new energy.

Apathy, you may recall, is the true enemy. I’ve always known that. I called her out [LW · GW] five years ago[7] [LW(p) · GW(p)], but soon came to realize I had been fighting her on the wrong front.

I had been preoccupied by the fact that students who don’t care won't activate enough of their brain to get any benefits from our daily review. To be fair, that is a problem, if I’m trying to prime them for success at a Day X event. But the more insidious issue is that a student in the thrall of Apathy won’t be churning their mental gears on any of the content I may have tricked them into learning, which means they’ll just forget it all without having made any lasting changes to their models. That’s not just an Anki-time problem. That’s an all-the-time problem. If they don’t engage with anything, they don’t keep anything.

I set off on a holy quest for anti-apathetic energy.

My errantry led me, for a time, to study stand-up comedy, not just because humor creates energy, but because a big part of that craft is an acting trick where you deliver incredibly polished lines in a way that sounds like you’re coming up with them right there in that moment.[8] [LW(p) · GW(p)] Perceived spontaneity is a powerful source of energy even more versatile than humor.

I don’t know if I learned much about scripted spontaneity that I could articulate, but I felt like some of it rubbed off on me just by watching the experts closely over extended periods. And you know what? A lecture isn’t so different from a bit. A lesson isn’t so different from a set. A single changed word, a half-second delay, a subtle shift in facial expression can completely change the way the moment feels to the audience class. And like a comedian workshopping new material on the road, I could use the fact that I might teach the same lesson five times in one day to test variations, trying to provoke more engagement, better questions, bigger laughs.

Equally important: I recognized that the process of refining the performance art was fun for me, and that my own engagement was the most powerful source of classroom energy. I could transmit it to my students, and maybe even get some energy back from them while I directed some of it into activity that would get their mental gears turning. Instead of burning out, I could burn brighter, and longer. On a good day, it became self-sustaining. On a great day, it could go supercritical, sending me home after my last class with my head spinning in a buzz of positive vibes and deep thoughts.[9] [LW · GW]

During this same era, as part of my ongoing study of creative writing, I was binge-listening to interviews with television writers. One pattern that struck me was that it wasn’t too uncommon for someone to just kind of find themselves working in that highly rarefied field simply because they had spent a lot of time around others who were already doing it. Without any organized instruction, they picked up on how it worked.

Did you catch it? That was twice that I had noticed how arcane expertise can rub off on people through prolonged proximity. That got me thinking about the German Apprenticeship Model, and its medieval — nay, prehistoric — roots. It’s how we used to learn everything, right? We followed mama out to the berry bushes, and papa out to the hunting grounds. The fact that it seemed to work for television writers told me that apprenticeship wasn’t just for blue collar skills.

So, with the longer leash I enjoyed under my new bosses, I decided to move my instructional style closer to something resembling an apprenticeship where I mentored groups of 20-30 padawans in my arcane expertise.

Yeah, I jumped on a trendy meme. Note my careful word choice: ‘show’, not ‘tell’. This, to me, is the defining action in mentor-apprentice relationships.

By switching schools, I lost my interactive whiteboard. So I replaced it with something even better: an extra computer on a make-shift stand-up desk (a narrow kitchen prep cart with fold-out boards.). A cheap second-hand monitor could face me while I mirrored that screen to the projector. Now I could do what I had seen coders do at instructional meet-ups: face the class while typing.

This meant I could show students what I do as a writer in real time, thinking out loud and watching their reactions as I typed. This could easily bore them, of course, but with strong energy-fu, old-school touch typing speed, and face-to-face interaction, I can pull it off more often than you might expect. On a good day, they find it fascinating. On one very special occasion each year, I do it for the full period, writing a 400+ word essay from scratch in 40 minutes with no prior knowledge of the prompt. Students have to hold their questions that day, and instead take observation notes, which become fodder for an extended debriefing discussion the next day.

The most important thing I’ve learned from those debriefings is that everyone can pick up something from a holistic demonstration like that, regardless of their skill level.[10] [LW · GW] An advanced student might ask about my bracket substitution of a pronoun in a quote. An average student might say, “You used a lot of small and medium-sized body paragraphs instead of three big ones.” A sub-level student might say, “You didn’t like it if you used the same word too soon after you used it before.” And I always seem to get at least one surprising question about something I never would have thought to teach them, like, “How did you suck words into your cursor?” Then I’m like, “Oh, let me show you the difference between the Backspace and Delete keys…”

Did I make them memorize anything with that “lesson”? Nah. Did they make lasting updates to their mental models? Probably! Are you thinking of asking me, “But how do you test them on it?” Because if you are, then you really haven’t been paying attention!

There’s plenty more to be said about apprenticeship, but I think you get the idea, and this is still nominally an essay about classroom SRS.

If I had to summarize my self-reinvention in too many words, I would say that I’m now optimizing for “good days” at the high-energy intersection of “engaging for me”, “engaging for them”, and “conducive to lasting and worthwhile updates to their mental models”, with less regard for curricular scope and sequence.

In practice, this means… well, a lot of things. But it’s time I pinch off Part 1. That, “or get off the pot,” as they say.

Part 2: A Third Year of Spaced Repetition Software in the Classroom (2017)

[In this excavated report, text in brackets in commentary I’m adding in 2021. Anything out of the brackets is direct from my 2017 draft, or constructed from my notes to fit the perspective I had at the time.]

Synopsis and disclosure

I tried the obvious thing this year. Instead of game show-style whole-class front-of-the-room Anki, I arranged for every student to be able to independently study material I created in Cerego, both in and out of class.

Disclosure: Cerego provided me a free license for the year in exchange for some detailed feedback, which I gave them. This feedback was mostly about user interface issues and reports, the latter of which required some ugly scripting on my end to get numbers I found useful. As the Cerego team seemed to be rapidly iterating, I imagine they have made many changes and improvements to their app since 2017, though I have not used it since. Please keep this in mind as you read these years-old notes.]

Despite many small hang-ups, I was pleased with the Cerego’s features and reliability. In exchange for a great deal of up-front effort, it gave me a unique window into student engagement and progress. Consequently, it proved to be an overwhelmingly potent tool for winning “the blame game”, although I eventually came to feel uneasy about using this power.

Longer-term learning outcomes seemed, on average, to be slightly worse than with the whole-class Anki method. While highly motivated students benefited from being able to study more aggressively and efficiently than before -- and their objective scores were higher than ever -- their learning seemed less transferable to more authentic contexts. Students of lower motivation, while seeming to get little from either approach, got even less from this digital 1:1 method, and their slump accounts for the overall decline.

Setup

I taught a mix of regular (not honors) 9th and 10th English classes again, but over the summer of 2016 I was invited to move my classroom into an unusually-spacious converted computer lab in which 16 older desktop PCs were kindly left at my request. I had these arranged facing the sides of the room so I could see all screens easily. I allocated those PC seats on a semi-permanent basis as needed and requested. The balance of students sat at normal desks and used their phones for study.

This came with challenges. School WiFi was officially off-limits to students (though many always had the password anyway), and many students said they were at the whim of data caps they regularly pushed up against. Their phones, in most cases, were a generation or three behind state-of-the-art, with degraded batteries and exhausted storage capacity. A few students had difficulty even making room for the Cerego app that first week.

While our setup was marginal, between the PCs and phones, we only rarely ran into a situation where not everyone could be studying at the same time.

On the software side, it must be said that, for all its features, Cerego wasn’t designed for my specific use case. The company’s featured customers are business and colleges, who use the product as part of packaged training programs and distance learning courses. Importantly, the app favors adding content into the learner’s study rotation in blocks, on the learner’s own schedule, rather than making it on the fly and trickling it immediately. It was also not designed to give a teacher “panopticon”-style real-time monitoring, nor to thwart adversarial users who want to look studious without studying.

Procedure

Before the start of each school day, I would consider the previous day’s lesson content and add to the relevant Cerego study sets as appropriate. This process could be lumpy and not necessarily daily; some lessons invited a great deal of suitable content, and others none at all. Content additions were also far more common first semester than second semester, as I intentionally front-loaded material to maximize the time we would have to reinforce and apply it. During an average week where I added cards, we probably averaged about 50 additions. [ ! ]

With a prominent timer at the front of the room, I allocated 10-12 minutes at the start of every 57 min class period as specially designated “Cerego Time”. During Cerego Time, I would periodically patrol the room to ensure students were on task and to provide support.

Students were allowed to read a pleasure-reading book during this time instead, if they chose. This allowance was most obviously meant for anyone with extra time after catching up with their study, but I wasn’t about to interfere with any teenager reading a book on their own volition. Not all regular readers (2-5 per class) were conscientious Cerego-ers.

Students were strongly encouraged to also use Cerego outside of class whenever the app recommended, if they wanted maximal retention for minimal time spent.

About once a week, usually without warning, I would give a ten question multiple choice quiz that could include questions directly taken from any content that had been in Cerego for at least a week, no matter how old. This was a multiple choice quiz done digitally in Canvas. Before I put the grade into my book, I would add a 10% adjustment (not to exceed 100%), respecting the wisdom that aggressive study sees diminishing returns as one approaches a goal of 100% retention on large bodies of knowledge. My students were aware of this free 10% and my reasoning behind it.

To account for students just joining my class at the start of second semester, and for those who inevitably studied nothing for the seventeen calendar days between semesters — and even for those simply desperate for a fresh start — I had a lengthy grace period of sorts in January and February. Older stuff was temporarily not included in the “quizzable” question pool. I posted dates for when I would consider each old set fair game again; every week or two, a set would find itself back in the pool according to this schedule, and stay there for the rest of the year.

I did not use Cerego stats directly for any kind of grade, instead using my Canvas quizzes for this. My reasons:

Points of friction

This section is not a critique of Cerego specifically, but rather a reminder that classroom technology is not inherently good. The mythical 1:1 student tech ratio doesn’t suddenly make impossible dreams reality, and in fact comes with ongoing costs that must be weighed against the benefits. Here were some points of friction I encountered:

All of the above adds up to a kind of tax on your time and energy, even when you have enough respect from your students to minimize deliberate abuse. (I had maybe 2-3 bad eggs during the year committing occasional acts of minor sabotage.) Moreover, every possible point of friction becomes amplified by a student who doesn’t feel getting to the objective, like a child who finds an hour’s worth of yak shaving to do whenever bedtime rolls around.

Problems with multiple-choice study cards

Unlike Anki and other personal-use SRS, where the user self-assesses performance and collaborates with the app to schedule the next review, apps like Cerego are built to measure retention objectively. This changes how study cards have to be constructed. Although options [even in 2017] are varied, the most practical and straightforward method is usually a “front” side card with a question or term and a “back” side of multiple-choice responses.

Some problems with multiple-choice format:

I gave my Cerego contacts some ideas I had for minimizing some of the downsides of multiple-choice. Because my students were largely deaf to my pleading that the “front” card screen — the one containing only the question — is where the learning actually happens, there could be a mandatory (or at least default, opt-out) short delay on that screen, especially when the app detects inhumanly rapid clicking. 

Cerego actually asks “Do you know this?” on that screen, giving them a chance to self-assess in the negative without going to the multiple choices, but the vast majority of students never saw this screen as anything but a speed bump to click through.

My thought was that Cerego could occasionally not show the multiple choice options right away when they click “I Know It”, but instead call their bluff, asking, “Oh? How confident are you?” and prompting them to select a confidence level on a slider bar before showing the choices. Not only might this end the bad habit, it could also provide an opportunity to help them with their credence calibration, a useful skill that might make them better thinkers and learners. I also suggested Cerego might be able to use this data to learn more about a learner and better judge their mastery level through sexy Bayesian wizardry.

[My aborted app design would have taken that concept to its logical conclusion: letting trusted users fully self-assess most of the time, but occasionally performing “reality checks” where it made the user respond in a way it could verify. It could then use straightforward Bayesian updates from these checks to decide how often to do them for each user.]

New failure modes

New format, new failure modes:

Dominating the blame game

Teachers are regularly asked by their bosses how they are “differentiating” instruction, adjusting lessons for students across a class’s range of skill levels, learning disabilities, and language deficiencies. They are also asked by parents what their children can do to improve their grade.

Cerego gave me a ready answer to both questions: “Well, in my class we use a free study app that I load with all of the terms, vocab and such that could be on my quizzes. It’s like smart flash cards that let you know when you need to study to avoid forgetting things. They adjust to give you more practice with the things you struggle with. Not only do I provide time to use it during class — even providing a computer if they need it — but it works on any internet device. Students can use it as often they like to be as prepared as they want to be.” Nobody ever complained about this answer, and some were quite impressed with it — more than I was, to be honest.

I also had powerful ammunition in the all-too-common scenario where, at a meeting with all of the child’s teachers, a parent blames poor grades on the teachers’ not adjusting to their child’s very special needs, instead of on their child’s ridiculously obvious laziness.

We can’t, of course, just come out and call it like we see it. But we can show parents our data and let them connect the dots. So, in these cases, I would just repeat my “Well, in my class we use a free study app…” spiel, emphasizing the “as prepared as they want to be” part. I would then add, “According to the app, your child has spent [x] minutes studying over the last week, which is about [y]% of the time my average ‘A’ student spent in that same period, and, come to think of it,” I would say, scratching my head for effect, “far less than the time I provide in class for it.”

Cue evil gaze from parent to child, squirming discomfort from child, envious awe from my fellow teachers.

It’s true! Here is a snapshot of one type of output I collected from my report-processing scripts for one of my students. You’re looking at one block of a larger data sheet I brought to parent meetings and included in periodic emails sent home. This one was for a fairly average student who put in the minimum expected time but didn’t push themselves very hard. A  slacker's would be more brutal.

Like I said, absolute dominance.

But like a lot of games, beating the “blame game” just made me tired of playing it, and ready to move on to something else. The enemy is not the apathetic student. The enemy is Apathy herself. I want to teach the lazy student, not destroy them with my Orwellian gaze.

Results and discussion

Table

In the following table, n=129, the sum of the 9th and 10th grade students that finished second semester with me. The procedures were identical in both grades, and I didn’t find much reason to divide them, preferring the larger total sample. I then divided the combined sample into quintiles as shown:

The "Sem 2 Grade" is their course grade from just the second semester, but the other stats are all cumulative for the year. (No, I don’t have any state test data for this group, and I never will. Having switched employers, I am not privy to the results, which arrive in late summer or early fall.)

“Set Level” is Cerego’s signature rating of overall progress and retention, on a 4-point scale.

“% of Cards Started” is the fraction of the total cards I had prepared that the students had added into their rotations. (Remember that Cerego did not do this automatically). For 9th graders, there were 648 cards. For 10th grade, there were 749. 

Study time analysis

As a sanity check, I crudely estimate that we had study time on 160 of our 180 school days, spending an average of 11 minutes each time. That would add up to 29.3 hours of total in-class study time. That the actual averages are lower does not surprise me, due to a combination of absences, roster changes, and start-up times. What we can conclusively say is that there was not a massive amount of outside-of-class study going on.

Of course, not all of those logged study minutes were productive study time. It wasn’t always clear to me when Cerego counted a minute towards study vs. idle, or whether it detected idleness at all on the mobile app. Indeed, there were several cases where a student’s mobile app seemed to have logged continual study overnight, and even, in one case, for multiple continuous days. The above chart has not been adjusted for known or unknown anomalies of this kind.

Regardless, as you can see, while time spent studying was correlated with performance, there was barely a 25% difference in study time separating the top and bottom grade quintiles. Even this is less exciting than it looks, as the lowest scorers were also more likely to be absent, missing their in-class study time. I have made no effort to adjust for this.

One thing you can’t see in that chart is the high variance that existed within the top quintile. In this group, time spent studying varied from 33 hours to 12 — and 12 was the top student! Anecdotally, I perceived two distinct subgroups of high performers: highly motivated learners who had a natural disadvantage, like being a foreign exchange student speaking a second language, and high IQ avid reader types. The former put in far more hours than the latter. In fact, that second group put in less time than the average bottom quintile student.

Only a very small number of highly motivated students showed signs of studying over weekends and breaks.

SRS signal, or just conscientiousness?

While you can see a much stronger signal in the “Set Level” and “% of Cards Started” columns, it’s hard to know how much this is just measuring conscientiousness. Good students are going to do what they’re asked to do, and get the good grade no matter what, but this doesn’t mean that what they’re asked to do is always necessary to get the good grade — or that the grade reflects anything worthwhile in the first place.

People persons

At least a few of the students I could never get to study Cerego were very on-the-ball whenever we did any kind of verbal review. 

[I’ve seen a lot of this pattern during the pandemic. Students who seemed like inert lumps online, with very low grades, have in many cases returned to the classroom and revealed themselves to be dynamic and invested. An engaging human at the front of the room really is the “value add” of in-person instruction. This is something I encourage my peers to keep in mind whenever deciding between autonomous work and teacher-student interaction.]

High automaticity in high achievers

When it came to automaticity, outlier results were more impressive than ever. The very small number of students at the overlap of highly motivated, highly intelligent, and highly competitive absolutely crushed it in the review game we regularly played at my interactive whiteboard, beating me on several occasions, which almost never happened previously.

Weak transference?

However, transference to other contexts was less evident. In my first report, I had remarked on anecdotal impressions [LW · GW] of higher-quality discussion and essay responses from those who had embraced our Anki review, suggesting that they had truly enlarged their lexicon to be able to talk about more complex ideas. I saw less of that this year. I don’t know what that means. It could just be that this mix of students was less open with their thoughts. But I can also see how they may have seen the Cerego universe as distinct from the universe of essay and discussion. Whole-class Anki might be more resistant to this bifurcation by making us say the words out loud to each other, normalizing their use.

Drama benchmark analysis

To compare methodologies as directly as possible, for a third year running I handled my Drama unit the way I accidentally had during my first year of classroom SRS: some terms taught before the pre-test, most taught after the pre-test, an identical post-test much later, and no review of any of it except through the SRS.

The overall results in the Drama unit were slightly worse this year. This was surprising. This cohort started lower on the pre-test, which was consistent with my impression of them, but I predicted that we would at least match or exceed last year’s gains, as we had more room to improve. We did not. Retention of some reliable bellwether terms actually dropped prior to the post-test. In picking through individual scores, my impression was that whole-class Anki and independent in-class Cerego were statistically equivalent for motivated learners, but whole-class Anki won easily with less motivated learners. As always, there were plenty of truly unmotivated students who got nothing from either method.

I tried to tease this out even further. This was pretty unscientific, but I took the pre and post-test scores of twenty students from last year, and aligned them individually to students from this year with similar pre-test scores and, in my view, similar work ethics. Highly motivated students starting very low may have done slightly better with Cerego than with Anki, but poorly motivated students starting low did somewhat better with Anki. 

I’m sure a lot of this came down to how Cerego makes new card sets “opt-in”. Students of lower motivation were less likely to encounter the Drama terms in their study rotation at all!

Phone vs. Computer seemed to make a difference here, too. Stuck with a very visible PC, some low performers would occasionally have good days and get in a groove. The ones glued to their phones found anything to do except Cerego.

Conclusions (2017)

If I see students as being ultimately responsible for their own learning, independent Cerego is the fairer approach that will help students get what they “deserve”. If I see things more pragmatically and utilitarian (as I do), the numbers favor the whole-class Anki approach. And yet...

If I were staying at that school, with my classroom computers, I would have tried to get the best of both worlds. It was my plan to use Cerego again — having already done most of the legwork — and try to make it friendlier, with more teacher interaction, supplementing with some whole-class Anki. I would have pushed Cerego’s developers to make some of my most wanted changes, and I would have pushed myself to cut back on the number of cards I used.

But it’s moot, now. I won’t have computers at my new school. And part of the reason I left was because I didn’t like the feel of the groove I was settling into.

Whole-class Anki review wins for simplicity and camaraderie. Cerego wins for surveillance and power. Which would you want to see stamping on a teenage face forever?

Trick question! It’s not nice to stamp on faces. I feel like I’ve been pushing SRS too far past the point of diminishing returns, and I don’t know why it has become an annual tradition for me to vow to cut back next year and then fail to do so. I should probably break that cycle. Apathy is the enemy, and she remains unbowed. I’ve been looking for a technological fix, but I think the solution is, at best, only partly technological.

[My notes here spiraled off into very technological solutions (sigh) to add to my dream SRS+ app, which I had already postponed again but still wasn’t ready to abandon. I suppose I can give myself a little credit for brainstorming features to encourage human interaction and conceptual connections. Eventually, my notes came back to some thoughts about what makes a class thrive, which I have translated into coherent sentences below.]

From a scalability standpoint, it’s nice that something like Cerego doesn’t depend on a teacher’s charm the way my whole-class Anki approach does. Teachers could do a lot worse than a standardized pack of quality Cerego sets that reinforce matching cookie-cutter lessons. But couldn’t teachers also do better? I think I could do better. Cerego and Canvas quizzes create distance between me and my students. But I want to bring us closer and dial up the enthusiasm.

I don’t think gamification is the answer. I’ve been noticing that the appeal of games is pretty niche, failing to capture many from the apathetic middle, and then for the wrong reasons, with the wrong incentives.

So what would work?

In education research, it always looks like everything works at least a little bit. This is probably a combination of publication bias and the fact that teachers sometimes get excited to try something new. Excitement is infectious. This gets students more engaged, which then improves outcomes. My early success with classroom SRS — and subsequent disappointments — would certainly fit that pattern.

Maybe I should make a point of trying new things each year for the explicit purpose of exploiting the excitement factor? How would I explain that to my bosses? “Well, I deliberately diverged from the curriculum and accepted best practice because I grew weary of them.”

[Yes, actually. My new bosses are great that way.]

Thesis, Antithesis, Synthesis (2021)

As a student of storytelling, I can’t help but find an arc to my fourteen years of teaching up to this point.

When I first started out, I didn’t know what I was doing but kept Apathy at bay through sheer passion. I worked harder than anyone. I couldn’t wait to try my stuff out, and students responded to all but my cringiest overtures.

When this inevitably exhausted me, I had a hard slump. Lessons that used to work fell flat. I still didn’t know what I was doing, and now lacked sufficient passion to brute force success. So I retreated into systems and structure, building word banks, prompt banks, quiz banks; rubrics, charts, and randomizers; running reports; slinging code. A suit of high-tech power armor to augment my feeble form. A different kind of brute force.

My systems gave me stability and staying power, and, eventually, the confidence to explore. My three years of heavy SRS experimentation were the culmination of this phase. I stretched. I grew. But I still felt plateaued and frustrated, perhaps having taken systems as far as they could go.

Apathy still mocked me from her emoji throne.

I step out of the armor and find I no longer need it. One by one, my systems clatter to the ground. I know who I am. I know where my power comes from. And I know my enemy.

She will lose, because she is overconfident. She won’t prepare, because she is indifferent. And she won’t hear my warning, because I issue it now in the one place I know she’ll never reach: the bottom of a 10,000 word essay.

I’m coming for you.

37 comments

Comments sorted by top scores.

comment by lejuletre · 2021-03-05T00:15:25.843Z · LW(p) · GW(p)

Hello ! I am Jaime who recognized your username on ACX and requested you publish this retrospective. Thank you so much for this; I found it very insightful and very helpful for my research.

I'm currently writing my thesis on spaced repetition in foreign language classrooms, and am planning to become a secondary school French teacher. The curriculum I'm writing integrates spaced repetition into the material reviewed on a given day, without using flashcards, and was heavily inspired by the model in this post. I have no idea if it'll work, but at the very least part of my thesis is creating a bunch of lesson plans, so I figure the worst possible outcome here is that I was wrong about everything but at least I made up some fun activities along the way.

I've suspected for awhile that one problem with school is that they try to get you to learn too much information, so it's nice to hear from an experienced teacher that I'm probably thinking along the right track.

Your commentary on apprenticeship is interesting; I remember learning French in middle and high school that I was deeply impacted by the few times where another French teacher would come chat with my teacher, in French. It was a rare opportunity to witness a fluent conversation in real time. Something I want to prioritize in my French classroom is bringing in guest speakers. I've previously assumed that the primary goal would be the evangelize practical applications of French (which I'm very aware are limited; but you can see how this sort of thing would be very relevant in a Spanish or Mandarin classroom). Maybe a secondary or even primary goal would be the opportunity for students to watch in real time a fluent French conversation, and lacking the spontaneity I witnessed between French teachers as a 7th grader, myself and the guest speaker could make a specific effort to use relevant and known vocabulary. I'm not sure of other ways that apprenticeship-style teaching could be incorporated into a foreign language classroom, but I'm very interested in researching further.

Replies from: tanagrabeast
comment by tanagrabeast · 2021-03-05T01:44:56.448Z · LW(p) · GW(p)

Experts talking shop with other experts is one of my favorite finds when I study!

During my dive into stand-up comedy, I came across this video of some top comedians talking shop. Especially from about the 30 minute mark, when they seem less concerned with entertaining their audience, they get into some juicy minutiae of why a joke might work or not. It really expanded my thinking on the subject.

Are such chats more insightful than an expert teacher would be in a lesson on that same topic? Not necessarily. But you might not find a skilled teacher ever teaching a lesson on that exact topic. I think humans are naturally primed to closely observe expert-expert chats for a few reasons:

Social proof. We instinctively want to be able to talk like the experts do so we can blend in with them. So we listen carefully to how they talk. 

Authenticity. If this is what experts actually talk about, we feel like it must really matter. It's not just the lesson of the day.

The overhearing effect. This is a term I'm making up, but I've found it to be an important one exploited by storytellers. We naturally want to deduce the context of overheard language, so we listen extra carefully, trying to fill in the blanks. I suspect this is down to humans' highly evolved appetite for gossip. The fact that the experts aren't talking to us is essential for exploiting this effect.

Although... I find that an expert talking to himself, seemingly unguarded, seemingly without conscious awareness that he is being overhead... can also trigger the overhearing effect. When I model a skill to my students, I try to verbalize my inner monologue in a way that will be intriguing to overhear and carry that essential whiff of authenticity.

I'm not sure what expert self-talk looks like in foreign language instruction, but I would be interested to find out. (Any ideas?)

But from my time becoming a reasonably fluent Spanish speaker (since lost), I can describe a few language dimensions I found interesting but neglected by all but the nerdiest supplemental books.

  1. Sentence-level inflection patterns vary, and it helps to be aware of them. For instance, the musicality of typical question sentence is different in American English than in Castilian Spanish. If you can pick up on the melody earlier in the sentence, you can better contextualize what is being said asked.
  2. The way speakers in different languages produce what seems, on the surface, to be identical phonemes, can be quite different, and understanding this is essential to actually sounding like a native. There can be hours of fun trying to practice a Castilian 'toh' sound (as in toma), with its thicker top-front palette tongue contact, vs. the American English cousin equivalent (as in tomato).
  3. Native speakers of language A learning language B often end up predictably adopting many of the same idioms and juicy words from language B into their language A conversations with each other, and they find themselves saying or thinking in those patterns even when their brains are mostly running language A. It could be fun to introduce some of these to novices and make it part of the language A classroom slang -- a kind of introduction to thinking in language B.
Replies from: lejuletre, PoignardAzur
comment by lejuletre · 2021-03-05T17:06:31.794Z · LW(p) · GW(p)

Do you think that instinctive drive to listen to experts "talk shop" applies to apathetic students, though? I worry that the chance to listen to "experts" (native speakers) chat at the front of the room would be too easily taken as a chance to tune out and relax (especially since what they're getting from the experience is more metadata about how a conversation in the target language works than any particular language content itself). I'm not sure how the "authenticity" rule applies either, for the same reason. I don't see how "We instinctively want to be able to talk like the experts do so we can blend in with them" would apply to a student with no desire to become an expert/fluent speaker. Do you think these are relevant questions, or do you think that the benefit of such a expert-expert interaction to highly motivated students would outweigh the uselessness to unmotivated ones?

My instinct for expert self-talk in foreign language instruction is just normal talking to oneself; "thinking aloud," as it were. I'm not sure the best way to demonstrate that to the class, since I think the most authentic scenario would be to put them in an immersive environment so they can figure it out. But if we had ready access to immersive environments then foreign language classrooms would be in much better shape.

WRT 2, yes, pronunciation is something that was never explicitly taught in my high school language experience, and I think it would help students build confidence in their speaking. I haven't yet figured out the best way to do this without teaching them IPA, but it's definitely something I want to incorporate. Even if they can get the vowel system down (ignoring tricky consonants like trilled R's and unaspirated T's) I think many students would be in a better boat than they are now.

Replies from: tanagrabeast
comment by tanagrabeast · 2021-03-06T02:39:12.604Z · LW(p) · GW(p)

Do you think that instinctive drive to listen to experts "talk shop" applies to apathetic students, though?

That's definitely the right question. If you and another expert leap straight into fluent French, no, I don't think your apathetic students will try to keep up -- especially if they are early beginners. More helpful might be a Franc-lish hybrid conversation where you swap stories of embarrassing errors and insights largely in English while sprinkling in French words and expressions, reenacting parts of colorful encounters from your combined French-speaking experience.

I also think one of the difficulties in modeling language fluency is that the whole point of being fluent is to not need to think about the language, but to simply think in it, so I'm not sure what your vocalized monologue would be about...unless...

Ok, here's a thought: I and the other motivated folks I learned Spanish with sometimes found ourselves slipping into a Spanglish patois outside of class where we spoke English with Spanish syntax. It felt like silly play at the time, but I now think it was an instinctive intermediate step to thinking in that language. 

"It makes rain." (It's raining.)

"To me pleases the rain!" (I like rain.)

Perhaps you could try fostering a Franc-lish dialect in your classes by thinking out loud in that style and inviting others to join you in banter, patiently nudging them to get the grammar right instead of just talking like Yoda. From there, substituting actual French with increasing frequency could feel very natural.

You may not have immersive environments, but I imagine you'll be creating simulated immersion: play-acting situations that give you a chance to think out loud as though you are navigating the moment for real. (Example: Going to the produce section of the store and seeing what looks good, what you could make with it, etc.) How much of that you should do in English, Frank-lish syntactic patois, or French will probably be something you will develop an expert instinct for as you become skilled at reading the room. Along the way, developing an entertaining stage presence for this play-acting would give you a powerful weapon against apathy.

Yes, yes... and you would be randomly involving students in your little improvised plays, assigning them roles, keeping them on their toes, making the non-participants want to get called on.

Yep, it sounds pretty awesome from the comfort of my not-having-to-teach-French perch :)

comment by PoignardAzur · 2021-03-21T01:35:34.323Z · LW(p) · GW(p)

It could be fun to introduce some of these to novices and make it part of the language A classroom slang -- a kind of introduction to thinking in language B.

There's a kind of slang that's like what you describe in r/france, where people will intentionally use idiomatic english expressions translated word-for-word in frech in non-sensical ways.

Eg people will say "je suis hors de la boucle" (I'm out of the loop) even though that sounds incomprehensible to someone who doesn't know the english idiom.

Some people get really annoyed about that pseudo-slang, though.

comment by habryka (habryka4) · 2021-03-04T07:23:38.417Z · LW(p) · GW(p)

Wow, it's great to see follow-up posts over the course of seven years. Thank you so much for the work you put into this! I am really looking forward to reading this thoroughly sometime in the next few days.

comment by tanagrabeast · 2021-03-04T20:22:41.590Z · LW(p) · GW(p)

Footnotes (each footnote is a reply to this comment)

Replies from: tanagrabeast, tanagrabeast, tanagrabeast, tanagrabeast, tanagrabeast, tanagrabeast, tanagrabeast, tanagrabeast, tanagrabeast, tanagrabeast
comment by tanagrabeast · 2021-03-04T20:27:27.086Z · LW(p) · GW(p)

[9] Do you want to know what I’ve hated most about teaching in person during the Covid-19 pandemic? The way mutual mask-wearing scrams my reactor. With my facial expressions concealed, my deliveries don’t land as consistently. With the students’ expressions concealed, I am deprived of the energy I would gain by getting a reaction out of them. The parts of the job that used to recharge me drain me instead. I don’t have words to describe how awful this feels.

comment by tanagrabeast · 2021-03-04T20:25:56.023Z · LW(p) · GW(p)

[3] A widespread bias I see in education is viewing every subject as a technical one with a straightforward dependency tree. Take my subject: English. The delusion held by seemingly all district-level curriculum czars is that, if Johnny’s reading scores are deficient, there must be one or two very specific dependencies he lacks. They will often look to a single wrong answer on a diagnostic test and say, “Ah! There it is. ‘Deducing the meaning of a word from context.’ Teacher, give them lessons on that until they master it.” 

Sorry. It doesn’t work that way. Johnny, like most humans, intuitively understands how to derive meaning from context. But in this case, he didn’t understand the context, because it’s one of the millions of things he’s naive about. He’s young and hasn’t read very many books. If we want to get reductive, I will concede the hypothetical possibility of making a shaggy graph of the millions of micro-dependencies that underpin an individual’s reading skill. But maybe we should just try to find Johnny some books he might like.

comment by tanagrabeast · 2021-03-04T20:28:23.490Z · LW(p) · GW(p)

[10] If you’re a fellow teacher, you know that this is the differentiation problem solving itself.

comment by tanagrabeast · 2021-03-04T20:27:17.448Z · LW(p) · GW(p)

[8] I remember the first time I appreciated this skill. It was when I saw this hilarious exchange between Louis CK and Conan O’Brien, and then saw the same content later as a bit in one of his shows (4:39). It seems embarrassing to have not seen it, but it hadn’t occurred to me that talk-show ‘interviews’ with comedians might sometimes be adaptations of their bits. Seriously, though, Louis CK really comes across as a spontaneously funny guy in that first clip. He elevates the convincingness of spontaneity into another layer of comedic art.

comment by tanagrabeast · 2021-03-04T20:27:02.449Z · LW(p) · GW(p)

[7] She goes by many names around the world. In the UK, teachers swap scary stories about Bore-a-trix Lestrange, Lady Macbarf, and Nary, Queen of Nots.

comment by tanagrabeast · 2021-03-04T20:26:42.975Z · LW(p) · GW(p)

[6] When it’s releasing more energy than you’re using to contain it.

comment by tanagrabeast · 2021-03-04T20:26:33.728Z · LW(p) · GW(p)

[5] This book would be somewhat redundant in a world where we already have David Didau’s What if everything you knew about education was wrong? I crossed paths with this title during a pensive season of my life and appreciated the way it asked questions from first principles, challenging orthodox assumptions without jumping to new conclusions. In particular, Didau had the words to express what I was feeling about forgetting.

comment by tanagrabeast · 2021-03-04T20:26:22.961Z · LW(p) · GW(p)

[4] Consider how a serial television show uses a “Previously, on [title]” to remind you of plot threads that are going to be relevant to this episode, some of which might be from several episodes back. This is superior to how they used to do it, which was “Last time, on [this show].” The primitive form would fail to remind you of relevant threads from older episodes and needlessly remind you of irrelevant threads from last week. When you review with your students, are you just reviewing the most recent stuff, or are you choosing the stuff that’s about to be relevant again?

comment by tanagrabeast · 2021-03-04T20:25:31.642Z · LW(p) · GW(p)

[2] You don’t have to justify yourself to me. I, too, have motivational and administrative reasons that keep me testing on occasion as well. But I approach and design them differently, when I can.

comment by tanagrabeast · 2021-03-04T20:25:06.401Z · LW(p) · GW(p)

[1] Neel Nanda beat me to a discussion of this. Worth a read. The comments are great, too. I was reassured that others like me with real experience, a little research, and rigorous thinking on the topic had reached such similar conclusions.

comment by Raemon · 2021-03-06T21:09:18.962Z · LW(p) · GW(p)

Curated. It's great to see such detailed notes on a such a long project. I appreciated insights about Spaced Repetition in Particular, as well as more general reflections on a longterm project and how you related to it.

comment by Kaj_Sotala · 2021-03-05T09:29:06.244Z · LW(p) · GW(p)

This meant I could show students what I do as a writer in real time, thinking out loud and watching their reactions as I typed. This could easily bore them, of course, but with strong energy-fu, old-school touch typing speed, and face-to-face interaction, I can pull it off more often than you might expect. On a good day, they find it fascinating. On one very special occasion each year, I do it for the full period, writing a 400+ word essay from scratch in 40 minutes with no prior knowledge of the prompt. Students have to hold their questions that day, and instead take observation notes, which become fodder for an extended debriefing discussion the next day.

Reminds me of Collins's cognitive apprenticeship.

... in traditional apprenticeship, the process of carrying out a task to be learned is usually easily observable. In cognitive apprenticeship, one needs to deliberately bring the thinking to the surface, to make it visible, whether it's in reading, writing, problem solving. The teacher's thinking must be made visible to the students and the student's thinking must be made visible to the teacher. That is the most important difference between traditional apprenticeship and cognitive apprenticeship. Cognitive research, through such methods as protocol analysis, has begun to delineate the cognitive and metacognitive processes that comprise expertise. By bringing these tacit processes into the open, students can observe, enact, and practice them with help from the teacher and from other students. [...]

A TEACHER MODELS GETTING STARTED

Assignment

(Suggested by students)

Write an essay on the topic “Today’s Rock Stars Are More Talented than Musicians of Long Ago.”

THINKING-ALOUD EXCERPT

I don’t know a thing about modern rock stars. I can’t think of the name of even one rock star. How about, David Bowie or Mick Jagger… But many readers won’t agree that they are modern rock stars. I think they’re both as old as I am. Let’s see, my own feelings about this are… that I doubt if today’s rock stars are more talented than ever. Anyhow, how would I know? I can’t argue this… I need a new ideaAn important point I haven’t considered yet is… ah… well… what do we mean by talent? Am I talking about musical talent or ability to entertain—to do acrobatics? Hey, I may have a way into this topic. I could develop this idea by

Replies from: tanagrabeast
comment by tanagrabeast · 2021-03-05T14:29:10.304Z · LW(p) · GW(p)

Oh, wow. Yes. That. Looks like there's another book I don't need to write.

The fact that the concept was so fleshed out thirty years ago kind of pisses me off. My teacher training was so the opposite of that (a bunch of student group work nonsense). And I'm not finding apprenticeship familiar to new teachers currently, though strong veterans often seem to have at least a half-baked version they've derived from experience. I get a lot of wide-eyed "Yes!" when I share it with them.

comment by Adam Křivka (adam-krivka) · 2021-03-23T09:21:44.209Z · LW(p) · GW(p)

(first time commenting on LW)
 

This post resonated with some of the failures with SRS I felt this term in school. I wrote a short retrospective / post-mortem here, but it probably lacks a lot of context, so here are a few quotes that seem most relevant here:

I recently read the absolutely terrific LessWrong blogpost Seven Years of Spaced Repetition Software in the Classroom. The title speaks about classroom use, and many people think that's very distinct from personal use. But I tried to pretend like my teachers are using SRS so it's the same.

 

All students poop (at least in formal education). They have to, otherwise they'd go crazy. It now seems naive that I thought I could go through Caltech while retaining ALL IMPORTANT INFORMATION FROM ALL THE CLASSES. 

 

...For example, entropy or enthalpy. You could define both of these in various ways, always carving out a new edge of them. They're not suitable for a prompt "What is entropy / enthalpy?", but maybe they're fine for "What's the formula for entropy / enthalpy?". 

But tell me now: how do I poop? Is the formula really the most important thing? No, what I'd really like is to deeply internalize a mental model of what entropy or enthalpy is (for the record, I still don't have them). How do I poop the formula and keep the model?

 

...here's what I want to do differently next term:

  • Optimize for passion and curiousity, not note-taking / spaced repetition workflow.
  • Distinguish between basic terminology and notation, and large and complicated concepts. Review terminology and notation frequently. Meditate over large and complicated concepts.
  • Look for the most important concepts and turn them into mental models. Forget the rest.
  • Learn. Don't learn using SRS. Don't learn by building a second brain in Roam. Just. Learn.
comment by MichaelLowe · 2021-03-08T12:40:24.202Z · LW(p) · GW(p)

Thank you for this great post. I would like to comment on a particular part: 

"Besides, even in the technical classes, forgetting is the near-universal outcome, and the long-term benefits are mostly conceptual — for if you don’t use these skills continuously for the rest of your life, you’re almost certainly going to lose them. Maybe more than once." 

This seems strongly like throwing the baby out with the bathwater. At least for math, it is, as you say, very clear that skills build on top of each other, and that weak students struggle greatly because they lack the foundational skills. I used to tutor friends and used to be regularly shocked by how much they forgot: how to divide by fractions, summation with exponents, just basic stuff. So the following is pretty likely: 

1)If you don't understand old concepts in math, you will not understand many new concepts 

2) many students forget old concepts 

3)Spaced Repetition  (not necessarily software!) ensures remembering

 We should be making sure they remember this material not in 20 years, but in 2 years if and when we make them struggle through calculus (unless one has a realistic plan for how to let the majority of underachievers in math just drop out of advanced math classes). This gives mediocre students a fair chance of actually building deeper understanding of what is going on on a conceptual level, and underachieving students a chance to get acceptable grades so that they do not become dejected and apathetic.  

For math, students get assigned homework anyway. I see no reason why that homework should not include an automatically generated exercise aimed to repeat an old concept. Of course, multiple-choice flashcards would be a failure mode.

Replies from: Kaj_Sotala, tanagrabeast
comment by Kaj_Sotala · 2021-03-08T17:09:56.652Z · LW(p) · GW(p)

I recall reading someone who applied spaced repetition principles to the math homework they assigned. Instead of doing the normal thing where each week would introduce new content and have homework on that content, the homework for each week was 1/3 that week's content, 1/3 previous week's content, 1/3 content from any of the previous weeks. Claimed that it significantly boosted people's retention and exam scores.

Replies from: lejuletre
comment by lejuletre · 2021-03-08T18:42:26.311Z · LW(p) · GW(p)

Is this the post you're thinking of ?

Replies from: Kaj_Sotala
comment by Kaj_Sotala · 2021-03-08T20:38:33.410Z · LW(p) · GW(p)

Looks like it, thanks! I'd have had no idea of how to re-find it.

comment by tanagrabeast · 2021-03-08T23:50:05.348Z · LW(p) · GW(p)

I don't see as much disagreement between us as you might be thinking. Precisely because I agree with your numbered points 1 and 2, I suggested it could be beneficial to compress most of our 12 years of math instruction down to a more intensive 2-3 years. That doesn't mean we couldn't instill useful basic arithmetic in lower grades. If we chose a smaller set of core basics, it could be quite practical to retain them over long summers and breaks -- at least for the students who stay in our system for the long haul. 

I'm also glad you brought up the fact that spaced repetition doesn't have to involve software. I should have done more to remind readers of this. I weave the spacing and testing effects into the fabric of my course in many ways that have nothing to do with software.

Carefully engineered homework assignments are great if you have motivated students. Take-home SRS could even work for that. Those students are usually fine, though. It's the apathetic middle I have to fight for, and they won't do homework regardless of how I try to incentivize it.

Moreover, I don't feel good about assigning to students who would hate to do it. School is already prison for those kids. I don't want to send prison home with them. As both a child and a parent, I have been too familiar with the toxic effects homework -- especially math homework -- can have on family relationships. Let kids have a light at the end of the daily tunnel, I say.

Is homework vital to a successful math program? I don't know. But I'm glad I don't teach math.

comment by cousin_it · 2021-03-04T13:43:19.495Z · LW(p) · GW(p)

I think the view of learning as memorization (or, as you suggest, repeated memorization and forgetting) isn't quite right. It's better to think of it as strength training, slowly building up ability that can be lost through disuse.

Learning a language does feel like memorization, but that's because your language ability is already trained. Math is harder, because most people's minds are untrained for math and need lots of simple repetitive exercises first. Art, music or sports are somewhere in between: they also need a ton of repetition, but at least we've figured out over centuries which exercises to use. Beginner programming is even harder than beginner math, because we don't even know which exercises we need.

Replies from: tanagrabeast
comment by tanagrabeast · 2021-03-04T14:57:05.478Z · LW(p) · GW(p)

I don't claim that learning is repeated memorization and forgetting though. Learning is when the brain updates its internal models in response to the information it has been chewing on. Forgetting the info after is mostly inevitable, but is not core to this model-update process. And though relearning has silver linings, this does not mean that it is ideal.

Replies from: CronoDAS
comment by CronoDAS · 2021-03-14T07:42:08.682Z · LW(p) · GW(p)

This seems like it would really suck if you're trying to learn something that doesn't have a simpler model than the thing itself. English spelling can be like this: you either know how to spell "through" and "separate" or you don't. I've always found foreign language vocabulary to be the hardest thing for me to learn in school. It's terribly incompressible. Knowing that "red" is "rojo" and that "blue" is "azul" doesn't help you at all when trying to remember that "green" is "verde". I wasn't terrible at it, but my usual trick of "remembering things by actually understanding what's going on, so I can compensate for imperfect memorization" was completely useless.

comment by Philip Hand (philip-hand) · 2021-05-21T00:58:35.768Z · LW(p) · GW(p)

Wow, thank you so much for this! I'm going through something like the same process as you, trying out Anki in the classroom and as an independent learning tool, and discovering its positives and negatives. I teach language, and it does help with learning words, though the way I use it is actually covert phonetics drilling. My students (Chinese, age 6-15) often simply haven't mastered the English sounds despite X years of learning. I set up Anki cards using words from a phonics textbook, so in the course of their Ankiing, whether individual or group, they have to distinguish between hem and ham, bad and bud, dog and dock, fuzz and fuss... There's the added benefit that they may remember the words, providing a working vocab to call on in grammar exercises. But I think it nicely illustrates the point you were making about learning concepts and patterns rather than the factoids themselves. What I really want early-stage learners to absorb is the phonics (the fact that the way you spell a word tells you what it sounds like, unfamiliar to many Chinese users because of Chinese characters); and the phonemes of English. Whether or not they remember what ham means is secondary.

comment by Brian Slesinsky (brian-slesinsky) · 2021-03-09T01:12:07.666Z · LW(p) · GW(p)

I’ve played around with Anki a bit, but never used it seriously because I was never sure what I wanted to memorize, versus look up when needed.

I wonder if it might be better to look at it a different way, using a note-taking tool to leverage forgetting rather than remembering? That is, you could use it to take notes and start reviewing cards more seriously when you’re going to take a test. Afterwards, you might slack off and forget things, but you still have your notes.

After all, we write things down so we don’t have to remember them.

Such a tool would be unopinionated about remembering things. You could start out taking notes, optimize some of them for memorization, take more notes, and so on. The important thing is persistence. Is this really a note-taking system you’ll keep using?

Teaching people to use such a tool would fall under “learning how to learn.” Ideally you would want them to take their own notes, see how useful it is for studying for a test, and get in the habit of using them for other classes. If not, at least they would know that such tools exist.

Back when I was in school, I remember that there was a teacher that had us keep a journal, probably for similar reasons. Maybe that got some people to start keeping a diary, who knows? For myself, I got in the habit of taking notes in class, but I found that I rarely went back to them; it was write-only. I kept doing it because I thought taking the notes helped a bit to remember the material, though.

comment by ArthurRainbow · 2021-04-03T20:45:40.583Z · LW(p) · GW(p)

Hi,

Thanks a lot for this detailed feedback. I had never heard of anki used to ask the whole room to review, interesting. I just shared it on anki subreddit

If I can, I'd strongly suggest you to edit slightly your introduction to add that you are the teacher and that it's an English class for US students. Personally, I was not sure at first you were student or teacher, and what kind of things you taught (I'd suspect that teaching math would be more complex than foreign vocabulary; even if I've never tested it with classes.)

You mention coding, you also mentioned anki. Did you think about reaching to the anki community for help? You probably know already that anki has quite a lot of add-ons. That means that dozens of devs know the codebase very well and could have created features and work with you to adapt the app. That may have added more freedom to your experimentation. I have little doubt that this concept of trying new stuff would interest enough people that, at least for simple changes, you'd have found devs offering some of their time.

For the context, I'm a maintainer of AnkiDroid, I contributed considerably to Anki's codebase and created more than 60 add-ons myself. So I'd admit I am extremely biased here.

comment by RedMan · 2021-03-06T14:54:01.739Z · LW(p) · GW(p)

Did you get IRB approval for these human studies on children?

Are you familiar with Direct Instruction, which is reminiscent of the Mennonite school?

Have you ever tried SRS for muscle memory?

Replies from: tanagrabeast
comment by tanagrabeast · 2021-03-07T01:30:01.263Z · LW(p) · GW(p)

Did you get IRB approval for these human studies on children?


I'm not sure which is more absurd: the IRB approval process or the very idea of high school. I've often asked people to consider a thought experiment where everyone on Earth suddenly forgets that our educational system as we know it ever existed. Would we really reinvent it just like it is now? Hearing how it worked, would we scream in terror and cancel anyone who had taken part? (Status quo bias much?)

When I was studying stand-up comedy, I actually developed a bit in which I play-acted a researcher proposing high school to an ethics board. It went like this:

RESEARCHER: "I was thinking we could stick 35 sleep-deprived teenagers in a room for an hour and expose them to academic stimuli. After that, we'll do some tests on them.”

BOARD: “I see. Tell me more about your subjects.”

RESEARCHER: "Well, they’re minors, obviously.”

BOARD: “Okay…”

RESEARCHER: “And most of them will be enrolled against their will.”

BOARD: “And how long will you need them?”

RESEARCHER: “6 sessions a day for four years.”

BOARD: "Wait, hold on. Sample size? How many kids are we talking about, here?"

RESEARCHER: "All of them."

BOARD: (mutterings among themselves) “Well, it sounds like everything is in order..."

Are you familiar with Direct Instruction, which is reminiscent of the Mennonite school?

Someone (probably on LW) pointed me to Direct Instruction a few years back, so yes, I'm acquainted with it. Because of the emphasis on staying fully reviewed on all relevant prior knowledge, I saw it as having obvious promise for technical subjects like math, in the hands of the right teacher. I was less convinced it made a good fit elsewhere, perceiving (perhaps unfairly -- I didn't dig too deeply) some big negative trade-offs:

  • Like with my whole-class Anki, it seems heavily reliant on the teacher's high-energy snake-charmer charisma. This makes it difficult to sustain for much of a class period and demands a great deal from a teacher who tries to do it all day long, day after day. This also makes it difficult to broadly among teachers with different personalities.
  • It sounds brittle with regards to roster variance. Specifically, it seems pretty insistent on having everyone in the room up to speed. With careful tracking/grouping of students, this can be achieved, but in practice, kids move in to your school part way through the year and aren't on the same page. Or you only have the one or two teachers for that grade level math, so the slowest kids are in the same boat as the sharpest. I would think that one or two stragglers would grind the class to a halt, and that this would be statistically inevitable in larger classes. (I don't know if this makes DI math worse than the status quo, where plenty of students are fall behind and get lost, but with less fanfare and hold-up for everyone else.)

Have you ever tried SRS for muscle memory?

No. I'm not seeing how that would work, or how that would be relevant to what I do, but I'm certainly curious. Do you have examples?

Replies from: gbear605, RedMan, Mo Nastri
comment by gbear605 · 2021-03-07T06:02:35.287Z · LW(p) · GW(p)

SRS for muscle memory

I don't know how they were thinking of it, but theoretically you can put any action into an Anki card: 

  • do a problem from chapter 1.5 of the linear algebra textbook
  • play through X specific piano piece
  • do five pushups

(though I doubt that the spaced repetition algorithm will really help with that last one)

I can further imagine that a skill which operates largely on muscle memory, like a martial art, could be done similar. "Do five kicks" could help you get the muscle memory behind that specific kick into your head, as long as you can be an accurate judge of how well you performed the kick.

comment by RedMan · 2021-03-07T14:53:39.918Z · LW(p) · GW(p)

Thank you for the detailed response! I used it for learning knot tying. It seemed to work, wanted to know if anyone else had tried it for anything like that.

comment by Mo Nastri · 2021-03-07T02:02:36.191Z · LW(p) · GW(p)

In any case I'm glad you didn't bother to jump through the IRB's hoops; admittedly my impression of them is colored by Scott Alexander's https://slatestarcodex.com/2017/08/29/my-irb-nightmare/