Write Good Enough Code, Quickly

post by Oliver Daniels (oliver-daniels-koch) · 2024-12-15T04:45:56.797Z · LW · GW · 10 comments

Contents

  Principles for Good-Enough Code
  My ML Research Workflow
  Conclusion
  Related Articles / Sources of Inspiration
None
10 comments

At the start of my Ph.D. 6 months ago, I was generally wedded to writing "good code". The kind of "good code" you learn in school and standard software engineering these days: object oriented, DRY, extensible, well-commented, and unit tested. I like writing "good code" - in undergrad I spent hours on relatively trivial assignments, continually refactoring to construct clean and intuitive abstractions. Part of programming's appeal is this kind of aesthetic sensibility - there's a deep pleasure in constructing a pseudo-platonic system of types, objects, and functions that all fit together to serve some end. More freedom than math, but more pragmatism than philosophy. This appreciation for the "art" of programming can align with more practical ends - beauty often coincides with utility. Expert programmers are attracted to Stripe because of their culture of craftsmanship, and Stripe promotes a culture of craftsmanship (in part, presumably) because "building multi-decadal abstractions" (in the words of Patrick Collison) is useful for the bottom line.

And this is all well and good, if (as is more often the case than not) you expect your code to be used in 1, 2, 10 years. But in research, this is often not the case! Projects typically last on the order of weeks to months, not years to decades. Moreover, projects typically involve a small number (often 1 or 0) of highly involved collaborators, as opposed to the large, fragmented teams typical of industry. Moreover, speed is paramount. Research is a series of bets, and you want to discover the outcome of the bet as fast as possible. Messy code might incur technical debt, but you don't have to pay if you scrap the entire project.

I had heard advice like this going into my PhD, both in the context of research and product development generally (MVP, Ballmer Peak, etc). It took me a while to internalize it though, in part, I suspect, because there's an art to writing "messy" code too. Writing error-prone spaghetti code is not the answer - you need stuff to work quickly to get results quickly. The goal is to write good enough code, efficiently, but learning what good enough means is a skill unto itself.

Principles for Good-Enough Code

Below is a first pass at some guiding principles. I focused on ML research in Python, but I suspect the lessons are generalizable

  1. Future-t you is the target user
    • where t is something like an exponential distribution with median 1 day.
  2. Minimize indirection - Have as much of the code as is reasonable in a single notebook
    • This is the kind of advice that’s horrible for freshman CS students, but probably helpful for first-year PhD students [1] Having everything in one place increases context - you can just read the program logic, without having to trace through various submodules and layers of abstraction. It also encourages you to constantly review code which otherwise might be tucked away, naturally helping you to catch errors, identify improvements, or notice additional axes of variation in the system.

  3. Only refactor when you need to - but always refactor when you need to
    • When you’ve had the impulse two or three times to pull something out into a separate function or object, do it. As a default though, be very suspicious of coding activity that isn’t directly doing the thing.
  4. Use your context
    • Lots of best practices in software engineering revolve around providing context and constraints to future readers and editors, in the form of comments, assertions, documentation, and type checks. Some of this will be useful for you 5 minutes after writing it, but a lot of it won't. So don't worry too much about using enums instead of strings, adding docstrings for each function, and avoiding magic numbers. Semantic naming and parse commenting are useful enough.
  5. Copy and paste is your friend
    • In keeping with minimizing indirection, it's often better to reuse a component by copying and pasting rather than sharing it across functions/scripts. Not only does this improve context, but it also promotes decoupling. If you end up needing to modify the component for a particular usecase, you can do so without worrying about how the change will affect functionality elsewhere (the conserve of this is that if you want to make the same modification, you have to do it twice, so, as always, user discretion is required).
  6. You're still allowed to think - slow is smooth and smooth is fast
    • When told to prioritize speed in coding, we often imagine the rogue hacker, wizzing away at a terminal , no time wasted without a keystroke. And sure, maybe 10x engineers operate something like this. But for mere mortals, it's important to remember that you can still do a bit of planning before getting to work. For me, planning usually takes the form of pseudo-code comments, but a little diagram sketching and rubbing ducking won't hurt either. The key is to efficiently execute an imperfect plan - and this requires having an imperfect plan to begin with.
  7. Avoid unit tests - at least early on
    • The most obvious case of trading speed for reliability. In research, you should be constructing your code incrementally, running it at each step in a REPL or notebook. By the time you're done, you've basically covered the central use case (running the full experiment script), and don't have to worry about arbitrarily users exploiting weird edge cases. You are the target user. And running the script is often the only (integration) test you need (do check tensor shapes though, ML debugging is hard and all).
  8. Use an LLM
    • This should be obvious. As of December 14th 2024, I'd recommend Curser with Sonnet 3.5 (though I occasionally use O1 to work through some math)

Again, all this advice assumes a baseline of "standard software engineering practices" - I want to help cure you of deontic commitments like never repeating yourself. But if you don't need curing in the first place, you should probably reverse this advice.

My ML Research Workflow

With these principles in mind, I'll walk through my current research workflow. My goal is to fluidly transition and forth from a rough experimental notebook to a full experiment pipeline with tracking, sweeps, and results visualization.

Mileage on this exact setup may vary, but thus far I’ve found it strikes a great balance between flexibility and efficiency. Most significantly, I've found my "ugh [LW · GWfield" around moving from local experimental notebook to submitting cluster jobs has been substantially reduced.

Conclusion

So yeah, those are my tips and basic setup. Again, they apply most strongly to early stage research, and most weakly to developing large compressive pieces of infrastructure (including research infrastructure like PyTorch, Hugging Face, and Transformer-lens). In some sense, the core mistake is to assume that early stage research requires novel extensive research infrastructure[2]. Developing open source infrastructure is, to a first approximation[3] prosocial: the gains are largely born by other users. So by all means, develop nice open-source frameworks - the world will benefit from you. But if you have new research ideas that you're eager to try out, the best approach is often to just try them ASAP.

  1. ^

     I was initially shocked by how “messy” this GPT training script was - now I think it's the Way

  2. ^

     This meme has been propagated to a certain extent by big labs, who make the (true) point that (infrastructure produced by) research engineers dramatically accelerates research progress. But this can simultaneously be true while it also being the case that for a small research time with limited budget, myopically pursing results is a better bet

  3. ^

     Reputational gains aside

10 comments

Comments sorted by top scores.

comment by Tahp · 2024-12-15T15:51:39.338Z · LW(p) · GW(p)

I'm doing a physics PhD, and you're making me feel better about my coding practices. I appreciate your explicit example as well, as I'm interested in trying my hand at ML research and curious about what it looks like in terms of toolsets and typical sort-of-thing-one-works-on. I want to chime in down here in the comments to assure people that at least one horrible coder in a field which has nothing to do with machine learning (most of the time) thinks that the sentiment of this post is true. I admit that I'm biased by having very little formal CS training, so proper functional programming is more difficult for me than writing whatever has worked for me in the past writing ad-hoc Bash scripts. My sister is a professional software developer, and she winces horribly at my code. However, you point out that it is often the case that any particular piece of research code you are running has a particular linear set of tasks to achieve, and so:

  • You don't need to worry much about resilient code which handles weird edge cases.
  • It is often better to have everything in one place where you can see it than to have a bunch of broken up functions scattered across a folder full of files.
  • Nobody else will need to use the code later, including yourself, so legibility is less important

As an example of the Good/Good-enough divide, here's a project I'm working on. I'm doing something which requires speed, so I'm using c++ code built on top of old code someone else wrote. I'm extremely happy that the previous researcher did not follow your advice, at least when they cleaned up the code for publishing, because it makes life easier for me to have most of the mechanics of my code hidden away out of view. Their code defines a bunch of custom types which rather intuitively match certain physical objects. They wrote a function which parses arg files so that you don't need to recompile the code to rerun a calculation with different physical parameters. Then there's my code which uses all of that machinery: My main function that I have written is sort of obviously a nest of loops over discrete tasks which could easily be separate functions, but I just throw them all together into one file, and I rewrite the whole file for different research questions so I have a pile of "main" files which reuse a ton of structure. As an example of a really ugly thing I did, I hard-code indices corresponding to momenta I want to study into the front of my program instead of making a function which parses momenta and providing an argument file listing the sets I want. I might have done that for the sake of prettiness, but I needed to provide a structure which lets me easily find momenta of opposite parity. Hard-coding the momenta let me keep the structure I was using at front of mind when I created the four other subtasks in the code which exploited that structure to let me construct subtasks which needed to easily find objects of opposite parity.

Replies from: oliver-daniels-koch
comment by Oliver Daniels (oliver-daniels-koch) · 2024-12-16T00:30:17.127Z · LW(p) · GW(p)

thanks for the detailed (non-ML) example!  exactly the kind of thing I'm trying to get at

comment by Oliver Daniels (oliver-daniels-koch) · 2024-12-23T16:53:36.193Z · LW(p) · GW(p)

still trying to figure out the "optimal" config setup. The "clean code" method is roughly to have dedicated config files for different components that can be composed and overridden etc (see for example, https://github.com/oliveradk/measurement-pred). But I don't like how far away these configs are from the main code. On the other hand, as the experimental setup gets more mature I often want to toggle across config groups. Maybe the solution is making a "mode" an optional config itself with overrides within the main script

comment by brambleboy · 2024-12-19T15:24:42.362Z · LW(p) · GW(p)

At the start of my Ph.D. 6 months ago, I was generally wedded to writing "good code". The kind of "good code" you learn in school and standard software engineering these days: object oriented, DRY, extensible, well-commented, and unit tested.

I think you'd like Casey Muratori's advice. He's a software dev who argues that "clean code" as taught is actually bad, and that the way to write good code efficiently is more like the way you did it intuitively before you were taught OOP and stuff. He advises "Semantic Compression" instead- essentially you just straightforwardly write code that works, then pull out and reuse the parts that get repeated.

Replies from: oliver-daniels-koch
comment by Oliver Daniels (oliver-daniels-koch) · 2024-12-20T05:06:08.371Z · LW(p) · GW(p)

just read both posts and they're great (as is The Witness). It's funny though, part of me wants to defend OOP - I do think there's something to finding really good abstractions (even preemptively),  but that its typically not worth it for self-contained projects with small teams and fixed time horizons (e.g. ML research projects, but also maybe indie games). 

comment by Jatin Nainani (jatin-nainani) · 2024-12-15T13:32:33.681Z · LW(p) · GW(p)

Can't agree more with this post! I used to be afraid of long notebooks but they are powerful in allowing me to just think. 

Although while creating a script I tend to use "#%%" of vscode to run cells inside the script to test stuff. My notebooks usually contain a bunch of analysis code that don't need to be run, but should stay. 

Replies from: oliver-daniels-koch
comment by Oliver Daniels (oliver-daniels-koch) · 2024-12-16T00:29:12.431Z · LW(p) · GW(p)

Thanks! huh yeah the python interactive windows seems like a much cleaner approach, I'll give it a try

comment by anaguma · 2024-12-15T05:29:34.159Z · LW(p) · GW(p)

This is a great post, and I like the research process. Do you know if the LLM code completion in cursor is compatible with ipynb notebooks?

Replies from: oliver-daniels-koch
comment by Oliver Daniels (oliver-daniels-koch) · 2024-12-15T05:36:13.375Z · LW(p) · GW(p)

thanks! yup curser is notebook compatible