Unit Test Everything

directedevolution

Unit Test Everything

post by DirectedEvolution (AllAmericanBreakfast) · 2022-09-29T18:12:28.850Z · LW · GW · 0 comments

    Inspiration
  The SELEX Experiment
  What a Unit Test Is And Isn't
  Why Unit Test The SELEX Experiment?
  Why Unit Test Everything?
  Unit Tests Are Part Of A Larger Process
  What Else Can We Unit Test?
None
No comments

Inspiration

Specializing in Problems We Don't Understand [LW · GW]
You Are Not Measuring What You Think You Are Measuring [LW · GW]
Humans are very reliable agents [LW · GW]
Epistemic Legibility [LW · GW]

The SELEX Experiment

For my MS research, I'm generating an aptamer for IL-10, a crucial anti-inflammatory immune signaling molecule. Aptamers are short pieces of DNA that bind a specific target. We discover them through a process of directed evolution called SELEX. By dunking IL-10 in a pool of around a quadrillion distinct short DNA sequences and amplifying the DNA that sticks, we can select for sequences that happen to bind IL-10.

In our case, the aptamer also needs to change its shape dramatically when it binds. IL-10 is a delicate protein, and normal SELEX protocols risk denaturing it. On top of these interesting engineering challenges are the ordinary pains that come with bringing a new method to a lab. We must adapt the approaches others have taken to the equipment we own, write new protocols, move tools around, make space to store things, coordinate schedules, and come to consensus despite the different forms of expertise each of us brings to the table.

We've made several modifications to existing SELEX protocols. This is novel enough to be a solid first-author publication for me if the experiment works. It's also complicated enough to be practically guaranteed to fail if we don't approach it very carefully at each step.

What a Unit Test Is And Isn't

Coming from a computational rather than a wet-lab background, I've approached the preliminary work for this experiment by concocting what I think of as unit tests. The term has its origin in software engineering.

Unit tests involve breaking up the code, or a real-world process, into granular chunks and designing ways to confirm that each chunk is giving the expected output, based on a predefined and carefully controlled input. I like to apply this concept to other processes and activities beyond software engineering. So we need to expand this concept beyond its software roots, while preserving the spirit intact.

Unit tests are supposed to be separate from the main process they're testing. In software engineering, they are disabled and excluded when the code is compiled for distribution. When the process is deployed for real, it should not depend on the unit tests in any way to work correctly.

Unit Testing Principles, Practices, and Patterns identifies four "pillars of a good unit test:"

Protection against regressions [bugs]
- A good unit test should protect against failure in high-risk chunks of your procedure. High-risk chunks tend to be large, complex, and significant for the output.
Resistance to refactoring [changes]
- A good unit test should be flexible enough to allow superficial and inconsequential changes in your methods without throwing a red flag. We want to test if there's a problem, not just if there's a change.
Fast feedback
- A good unit test is fast to run. This lets you budget more of them into your process. Or it lets you knock off early, whatever floats your boat!
Maintainability
- A good unit test is easy to explain, understand, and execute. How and why does it work? Is there anything tricky about the implementation?

Not everybody does unit tests, but some people go so far as to design their entire software engineering process around unit tests. Some even write their unit tests before they write the code itself!

We should distinguish unit tests from two related, but fundamentally different, types of software tests: business logic and exception handling.

In software engineering, business logic is the rules that determine how users interact with the information managed by the program. As an example, our university's purchasing software requires an extra approval from a lab's PI for orders over $1,000, which can delay orders by a day or more.

Business logic can be written as intended by the designer, but still give a bad outcome. For example, our purchasing software doesn't flag orders that have been split up into smaller amounts. This incentivizes students to break up large orders into smaller, separate ones in order to get them faster. This adds more costs for the university. Since the PI already had to review all orders to supply the shortcodes prior to this final approval step, it would be surprising if this rule didn't cost the university more money than it saves in expectation.

Unit tests are meant to check that the business logic has been encoded and executed successfully. They work by choosing an input where it's easy to determine what the business logic output ought to be, and then checking to make sure that it really does produce that expected output.

Exception handling is a way to check for errors during runtime, when the main process is being executed. It happens after the unit tests. For example, I practice moving more gracefully through space by trying to use the minimum amount of physical force and unnecessary impacts to do simple tasks like turning doorknobs, putting glasses away in the cabinet, or carrying bowls to the sink. When I pick up a bowl, I check my grip to see if it's strong enough to hold the bowl, even if I bump into something on my way to the sink. When I pipette in the lab, I look closely at the meniscus of the liquid in the pipette tip to make sure it's where I expect it to be.

While these real-world "exception handling" routines are themselves so useful that I may write a whole separate post about them, they aren't properly unit tests. They occur during, and may influence, the main process itself as it unfolds. They are an integral part of it, not something you "disable" when you are ready to execute the process for real. A fire drill is a unit test. A teacher evacuating their students in response to a fire is doing "exception handling," not unit testing.

We should also contrast "unit testing" with a the more vague notion of using any form of reflection or feedback to somehow inform what you are doing. The gut feeling that "it'll be fine" or "that seemed to go well" is not a unit test. You should be able to articulate explicitly what pieces of data are making you think things will "go fine," what it means to "go fine," and what alternative pieces of data would have caused you to think things were "not going fine." These intuitive reflections can be useful, but they can also be deceptive, and they're certainly not unit tests.

Why Unit Test The SELEX Experiment?

Our materials are precious, and it can take weeks to order more stuff. This experiment will take a minimum of 20 full days just for the initial SELEX, plus more time to have the results sequenced, resynthesized, tested, and written up. I'm also a relatively new member of the lab. Although the lab is swimming in cash, I expect that my PI's willingness to throw money at this project will hit a limit unless I show results. So it's crucial to conserve materials. I can't afford to throw everything at this method and have it just not work.

So I spend a lot of time these days creating and running unit tests. I want to be as sure as I can that every step in this protocol works the way it ought to. That takes even more time than just running the experiment. But we won't find success without these tests.

What kind of information do my unit tests give me?

They make sure our custom DNA is actually in the bottles we receive from the manufacturer.

They test to make sure we can illuminate the DNA in the agarose gel, that the DNA ladder resolves, and that our DNA ends up at the expected spot relative to the ladder (it doesn't, but we know why and aren't worried about it).

They make sure that our master mix and PCR protocol work with our custom DNA. They even check to make sure the plastic walls of our tubes aren't too thick to let the heat in.

They check to make sure our beads bind a special form of chemically modified "bead DNA," and that this bead DNA will bind our potential aptamers via a special complementary base pairing region in the middle.

They test our fluorescent DNA on the UV transilluminator to make sure it visualizes it properly (it doesn't, and we feared we'd have to rework the protocol in a totally janky way, but then another lab turned out to have a transilluminator that passed our unit test!).

It might be hard to tell from my facial expression in this picture, but I'm extremely happy with this unit test result - that middle band is the fluorescent dye Cy5 glowing beautifully on the tips of our potential aptamers!

This is nowhere close to a comprehensive list of our unit tests, and they're not all finished yet. They're built on a foundation of months of reading, planning, collating information, writing, and many conversations with my collaborator, our manufacturers, tech support, core facilities, our lab's PI, and other researchers in the field.

Why Unit Test Everything?

Figuring out how to unit test everything is wonderfully clarifying.

It forces you to think in detail about how you will know a step in the experiment worked. Once the unit tests are complete, you can run integration tests, making sure that each unit will work with the others to give results. This lets you avoid "flying blind" as much as possible. Unit tests are a firehose of information. They let you ensure your experiment is measuring what you think it's measuring. [LW · GW]

Unit tests create an opportunity to practice your experiment in a low-stakes, high-feedback setting. They also constitute a training program you can use to skill up the person who will inherit the protocol after you, if it proves successful.

Unit tests let you detect errors in advance and figure out workarounds. They turn your intuitions about how things ought to work based on your mechanistic or intuitive models and past experience into data about how things do work.

This summer, I built a fence. My unit tests were focused on figuring out where to place the fenceline and the posts, designing the panels, making sure we weren't going to hit the gas lines with the auger, and ensuring we could dismantle an old section of fence without too much trouble. We'd measure and remeasure to make sure the posts were level and spaced correctly. Our most important test was building the gate frames and making sure they physically fit between the posts before adding panels. When we did this, we could fix an ill-fitting gate by taking out a couple screws. The two times we didn't, we wound up having to dismantle and rebuilt the entire gate in the hot summer sun.

In my bioengineering lab, these details include the state of repair and capabilities of my lab's equipment, the physical technique and visual acuity involved in manipulating strange materials like microscopic magnetic beads or tiny test tubes of clear liquid with enormously consequential little white particles floating in them, and the whirling schedules and egos and biases of everybody in the lab, including myself. I've found that I simply cannot anticipate all the details in advance, so it's important to design a process that is oversight-tolerant.

An adequate suite of unit tests takes "a plan that is confident and well-thought-through but doomed to fail" to "a plan built from components that have survived the ultimate red-team of cold hard reality and might actually work in practice."

Unit Tests Are Part Of A Larger Process

My experience so far is that a unit test starts with an elaborately thought-through plan for how to go about your project. Your strategy has to be informed by experiences and data from others, mechanistic models about how the phenomenon you're trying to manipulate works, and concrete, written-out plans that you've made and refined over time. You have to have a deep, falsifiable model of how all the parts of your project are going to work.

This isn't the same as a waterfall design process, in which you have every single step of the project planned out to a T. Instead, there is a point at which you've planned enough to do some productive unit testing. A good unit test has a clear motivation for why you're doing it. You should feel confident that each possible result of the unit test would control at least one important decision. If you can't articulate each possible outcome and how you would react to them, or if you would make the same decision no matter the outcome, then there is no point yet in doing that unit test.

When you've successfully run an adequate suite of unit tests, they should cover the whole range of steps and challenges your real project is going to face. There is a tradeoff between what you'd spend on your unit tests and what you'd invest in committing to a full-scale attempt. It might also be that it truly wouldn't be practical to unit test some aspect of the project. Other times, you might be justifiably confident that you have the information you need already, and a particular unit test is unnecessary. That is fine. Unit tests should be done for a reason, not as a ritual.

What Else Can We Unit Test?

Where do we find unit tests elsewhere in life? What would benefit from a more robust suite of unit tests?

The Checklist Manifesto: Checklists are unit tests. They build in a check to ensure that the key steps for a successful process have been taken.
Cooking: Restaurants develop their menus by testing recipes before the restaurant opens. Popups and food carts are a way to test a restaurant concept before investing in a brick and mortar.
Dating and relationships: A date is a sort of unit test of a relationship. Is it possible to unit test relationships on a more fine-grained level? Can we reintroduce unit tests into long-term, steady relationships? Can we think of "unit testing" as a way to figure out what sort of relationship you want to have? Can you "unit test" the role of being a parent?
Scholarship: Practice tests and seeing how much material you can recall off the top of your head are unit tests for exams.
Music: As a classical or orchestra musician, a core practice technique is to break up the music into small chunks for individual practice. Students who fail to do this make slow progress. Home practice are unit tests for rehearsal, and rehearsals are unit tests for the performance.
Writing: A big stage in writing maturity is the transition to making a clean distinction between your first draft and the final publication. Drafts are unit tests of what you're trying to communicate. The communication is the final product. The individual pieces of writing you generate along the way and vet with your editors are the unit tests.

Thanks to Justis Mills and Connor Tabarrok for editing.

0 comments

Comments sorted by top scores.

Unit Test Everything

Contents

The SELEX Experiment

What a Unit Test Is And Isn't

Why Unit Test The SELEX Experiment?

Why Unit Test Everything?

Unit Tests Are Part Of A Larger Process

What Else Can We Unit Test?

0 comments