The Simple Solow Model of Software Engineering

post by johnswentworth · 2019-04-08T23:06:41.327Z · score: 26 (10 votes) · LW · GW · 15 comments

Contents

  Prediction 1
  Prediction 2
  Prediction 3
None
15 comments

Optional background: The Super-Simple Solow Model

Software is economic capital - just like buildings, infrastructure, machines, etc. It’s created once, and then used for a (relatively) long time. Using it does not destroy it. Someone who buys/creates a machine usually plans to use it to build other things and make back their investment over time. Someone who buys/creates software usually plans to use it for other things and make back their investment over time.

Software depreciates. Hardware needs to be replaced (or cloud provider switched), operating systems need to be upgraded, and backward compatibility is not always maintained. Security problems pop up, and need to be patched. External libraries are deprecated, abandoned, and stop working altogether. People shift from desktop to browser to mobile to ???. Perhaps most frequently, external APIs change format or meaning or are shut down altogether.

In most macroeconomic models, new capital accumulates until it reaches an equilibrium level, where all investment goes toward repairing/replacing depreciated capital - resurfacing roads, replacing machines, repairing buildings rather than creating new roads, machines and buildings. The same applies to software teams/companies: code accumulates until it reaches an equilibrium level, where all effort goes toward repairing/replacing depreciated code - switching to new libraries, updating to match changed APIs, and patching bugs introduced by previous repairs.

What qualitative predictions does this model make?

Prediction 1

If a software company wants to expand the capabilities of their software over time, they can’t just write more code - the old software will break down if the engineers turn their attention elsewhere. That leaves a few options:

Hiring more engineers is the “throw money at it solution”, and probably the most common in practice - but also the solution most prone to total failure when the VC funding dries up.

Hiring/training better engineers is the dream. Every software company wishes they could do so, many even claim to do so, but few (if any) actually manage it. There are many reasons: it’s hard to recognize skill levels above your own, education is slow and hard to measure, there’s lots of bullshit on the subject and it’s hard to comb through, hard to get buy-in from management, etc.

Figuring out better ways to make the software do what it does is probably the most technically interesting item on the list, and also arguably the item with the most long-term potential. This includes adopting new libraries/languages/frameworks/techniques. It includes refactoring to unify duplicate functionality. It includes designing new abstraction layers. Unfortunately, all of these things are also easy to get wrong - unifying things with significantly divergent use cases, or designing a leaky abstraction - and it’s often hard to tell until later whether the change has helped or hurt.

Prediction 2

New products from small companies tend to catch up to large existing products, at least in terms of features. The new product with a small code base needs to invest much less in fighting back depreciation (i.e. legacy code), so they can add new features much more quickly.

If you’ve worked in a software startup, you’ve probably experienced this first hand.

Conversely, as the code base grows, the pace of new features necessarily slows. Decreasing marginal returns of new features meets up with increasing depreciation load, until adding a new feature means abandoning an old one. Unless a company is constantly adding engineers, the pace of feature addition will slow to a crawl as they grow.

Prediction 3

Since this all depends on depreciation, it’s going to hit hardest when the software depreciates fastest.

The biggest factor here (at least in my experience) is external APIs. A company whose code does not call out to any external APIs has relatively light depreciation load - once their code is written, it’s mostly going to keep running, other than long-term changes in the language or OS. APIs usually change much more frequently than languages or operating systems, and are less stringent about backwards compatibility. (For apps built by large companies, this also includes calling APIs maintained by other teams.)

Redis is a pretty self-contained system - not much depreciation there. Redis could easily add a lot more features without drowning in maintenance needs. On the other end of the spectrum, a mortgage app needs to call loads of APIs - credit agencies, property databases, government APIs, pricing feeds… they’ll hit equilibrium pretty quickly. In that sort of environment, you’ll probably end up with a roughly-constant number of APIs per engineer which can be sustained long term.

15 comments

Comments sorted by top scores.

comment by cousin_it · 2019-04-10T16:12:48.293Z · score: 9 (4 votes) · LW · GW

In most macroeconomic models, new capital accumulates until it reaches an equilibrium level, where all investment goes toward repairing/replacing depreciated capital—resurfacing roads, replacing machines, repairing buildings rather than creating new roads, machines and buildings.

Must be something wrong with the models then, because that doesn't sound like any place I've ever been. People don't prioritize maintenance above any and all new stuff; it's human nature to invest in new stuff even while old stuff crumbles.

The same is true for software. I wish there was a natural law limiting software bloat, but look around - do you think there's such a law? Can you name any large project that reached equilibrium and stopped growing? I can't. Sure, as the project grows it gets harder to maintain at the same quality, but people don't let that stop them! They just relax the standard of quality, let older and less important features go unmaintained, and keep piling on new features anyway.

comment by johnswentworth · 2019-04-10T17:49:35.094Z · score: 6 (3 votes) · LW · GW

Good points, this gets more into the details of the relevant models. The short answer is that capital equilibrates on a faster timescale than growth happens.

About a year ago I did some research into where most capital investments in the US end up - i.e. what the major capital sinks are. The major items are:

  • infrastructure: power grid, roads, railroads, data transmission, pipelines, etc.
  • oil wells
  • buildings (both residential and commercial)
  • vehicles

Most of the things on that list need constant repair/replacement, and aren't expanding much over time. The power grid, roads and other infrastructure (excluding data transmission) currently grow at a similar rate to the population, whereas they need repair/replacement at a much faster rate - so most of the invested capital goes to repair/replacement. Same for oil wells: shale deposits (which absorbed massive capital investments over the past decade) see well production drop off sharply after about two years. After that, they get replaced by new wells nearby. Vehicles follow a similar story to infrastructure: total number of vehicles grows at a similar rate to the population, but they wear out much faster than a human lifetime, so most vehicle purchases are replacements of old vehicles.

Now, this doesn't mean that people "prioritize maintenance above new stuff"; replacement of an old capital asset serves the same economic role as repairing the old one. But it does mean that capital mostly goes to repair/replace rather than growth.

Since capital equilibrates on a faster timescale than growth, growth must be driven by other factors - notably innovation and population growth. In the context of a software company, population growth (i.e. growing engineer headcount) is the big one. Few companies can constantly add new features without either adding new engineers or abandoning old products/features. (To the extent that companies abandon old products/features in order to develop new ones, that would be economic innovation, at least if there's net gain.)

comment by cousin_it · 2019-04-10T21:54:02.280Z · score: 15 (4 votes) · LW · GW

Ah, I see. Your post drew a distinction between "repairing buildings" vs "creating new roads, machines and buildings", but you meant something more subtle - do we build the new building to replace the old one, or because we need two buildings? That makes sense and I see that my comment was a bit off base.

comment by johnswentworth · 2019-04-10T23:02:40.833Z · score: 6 (3 votes) · LW · GW

Yeah, in retrospect I should have been more clear about that. Thanks for drawing attention to it, other people probably interpreted it the same way you did.

comment by Chris Leong (chris-leong) · 2019-04-09T02:33:36.686Z · score: 7 (4 votes) · LW · GW

There's other considerations that slow large code bases:

  • The more features you have, the more potential interactions
  • The bigger a codebase is, the harder it is to understand it
  • Having more features means more work is involved in testing
  • Customer bases shift over time from early adopters to those who want more stability and reliability
  • When a code base is more mature, there's more chance that a change could make the product worse, so you have to spend more time on evaluation
  • A larger customer base forces you to care more about rare issues
comment by mr-hire · 2019-04-09T13:03:40.705Z · score: 6 (4 votes) · LW · GW
The biggest factor here (at least in my experience) is external APIs. A company whose code does not call out to any external APIs has relatively light depreciation load - once their code is written, it’s mostly going to keep running, other than long-term changes in the language or OS. APIs usually change much more frequently than languages or operating systems, and are less stringent about backwards compatibility. (For apps built by large companies, this also includes calling APIs maintained by other teams.)

Of course, the trade off is that then YOUR engineers have to maintain the code for security updates/refactors when they realize its' unmaintable or built wrong vs. having the API do it.

One way to look at this is through the lens of Wardley [LW · GW] Evolution [LW · GW]. When a new function is not that well understood, it needs to be changed very frequently, as people are still trying to figure out what the API needs and how to correctly abstract what you're doing. In this case, it makes sense to build the code yourself rather than using an API that knows less about your use case than you do. An example might be the first few blockchain's writing their own consensus code instead of relying on Bitcoin's.

On the other extreme, when a certain paradigm is extremely understood such that it's commoditized, it makes sense to use an existing API that will keep up to date with infrequent security vulnerabilities instead of having your engineers do that. An example would be having webapps use existing SQL databases and the existing SQL API instead of writing their own database format and query language.

In the middle is where it gets murky. If you adopt an API too early, you run the risk of being in a place where you're spending more time keeping up with the API changes than you would writing your own thing. However, adopt it too late, and you're spending valuable time trying to come up with the correct abstractions and refactor your code so that it's more maintainable, whereas it would be cheaper to just outsource that work to the API developers.

comment by jimrandomh · 2019-04-11T00:41:38.365Z · score: 5 (3 votes) · LW · GW

As a working software engineer with experience working at a variety of scales and levels of technical debt, this mostly feels wrong to me.

One of the biggest factors in the software world is a slowly rising tide of infrastructure, which makes things cheaper to build today than they would have been to build a decade ago. Projects tend to be tied to the languages and libraries that were common at the time of their creation, which means that even if those libraries are stable and haven't created a maintenance burden, they're still disadvantaged relative to new projects which get the benefit of more modern tools.

Combined with frequent demand shocks, you get something that doesn't look much like an equilibrium.

The maintainability of software also tends to be, in large part, about talent recruiting. Decade-old popular video games frequently have their maintenance handled by volunteers; a firm which wants an engineer to maintain its decade-old accounting software will have to pay a premium to get one of average quality, and probably can't get an engineer of top quality at any price.

comment by johnswentworth · 2019-04-11T16:45:45.694Z · score: 8 (4 votes) · LW · GW

Possible point of confusion: equilibrium does not imply static equilibrium.

If a firm can't find someone to maintain their COBOL accounting software, and decides to scrap the old mainframe and have someone write a new piece of software with similar functionality but on modern infra, then that's functionally the same as replacement due to depreciation.

If that sort of thing happens regularly, then we have a dynamic equilibrium. As an analogy, consider the human body: all of our red blood cells are replaced every couple of months, yet the total number of red blood cells is in equilibrium. Replacement balances removal. Most cell types in the human body are regularly replaced this way, at varying timescales.

That's the sort of equilibrium we're talking about here. It's not that the same software sticks around needing maintenance forever; it's that software is constantly repaired or replaced, but mostly provides the same functionality.

comment by shminux · 2019-04-09T03:51:01.302Z · score: 5 (3 votes) · LW · GW

In many ways the software repairability is much worse than that of physical objects, because there are no certifications for software quality, at best unit tests and regression tests, if that. The bit rot through the changes in the environment, like APIs, interfaces, even CPU/GPU change only adds to that. Software cannot be maintained like, say, bridges, (or fridges) because there are no spare parts you can find, and building them from scratch is at some point costlier than a rewrite, especially if the original designers and maintainers are all gone. So, a company needs to design for planned obsolescence if they want to avoid excessive maintenance costs (what you call "hire more engineers", only the cost of maintenance grows exponentially). Hiring/training better engineers is unfeasible, as you can imagine. There are not enough top 1% engineers to staff even the top 10 high-tech companies. Figuring out better ways to make software works for awhile, but then the software expands to saturate those "better ways", and you are back where you started. Planned rewrites and replacements would cut the costs, but, like investing money into prevention in healthcare, that is something that is never a priority. And so we are stuck with sick geriatric fragile code bases that take a fortune to keep alive until they are eventually let die long past their expiry date.

comment by JohnFisher · 2019-04-10T17:48:35.661Z · score: 3 (2 votes) · LW · GW

Sorry but the software world described here has little to do with my daily work in software. As most apps have moved to webapps, and most servers are now in the Cloud, and most devices are IoT cloud-connected, as all these trends have happened, the paradigm for software has evolved to maximizing change.

Software never was very re-usable itself, but frameworks and APIs turned out to have huge value, so now we have systems everywhere based a a layered approach from OS up to application, where application software is quite abstracted from the OS and hardware and support software ( e.g. webserver or database). However frameworks also change quickly these days - JQuery-Angular-React-Vue.js .

Cloud engineering is all about reliability, scalability, and a very rapid change process. This is accomplished through infrastructure automation, and process automation. Well-organized shops aim to release daily, and at the same time, have very good quality. We use CI/CD patterns that automate every step from build to deployment.

Containers are everywhere, but the next step is Kubernetes and serverless in the Cloud, where we hardly touch the infrastructure, and focus on code and API. I see no chance that code will last long enough to depreciate.

Making high-quality software is all about the process and the architecture. You just can't meet today's requirements building monoliths, on manually-managed servers.

comment by johnswentworth · 2019-04-10T21:17:39.438Z · score: 4 (2 votes) · LW · GW

Sounds like you're mostly talking about ops, which is a different beast.

An example from my previous job, to illustrate the sort of things I'm talking about: we had a mortgage app, so we called a credit report api, an api to get house data from an address, and an api to pull current pricing from the mortgage securities market (there were others, but those three were the most important). Within a six-month span, the first two apis made various small breaking changes to the format returned, and the third was shut down altogether and had to be switched to a new service.

(We also had the whole backend setup on Kubernetes, and breaking changes there were pretty rare. But as long as the infrastructure is working, it's tangential to most engineers' day-to-day work; the bulk of the code is not infrastructure/process related. Though I suppose we did have a slew of new bugs every time anything in the stack "upgraded" to a new version.)

comment by JohnFisher · 2019-04-10T23:28:25.700Z · score: 1 (1 votes) · LW · GW

Well I've heard those bank APIs break a lot. I think I am trying to say that software lifespan is not at all what it used to be 10-15 years ago. Software is just not a *thing* that gets depreciated, its a thing that never stops changing. This company here too separates infrastructure engineering from software, but that's not how the big kids play, and I am learning some bitter lessons about why. It really is better if the developers are in charge of deployment. Or at least constantly collaborating with the DevOps crew and the OPs crew. Granted every project has its special requirements, so no idea works everywhere. But "throw it over the wall" is going away.

Maybe this is all just this years buzzwords, but I don't think so. I am seeing some startups going after rust-belt manufacturing software, where they are often still running on XP, and dare not change anything. These startups want to sell support for a much more highly automated process, with much more flexibility. Good business model or not, you just can't do that sort of thing in a waterfall release process.

comment by Hazard · 2019-04-09T01:44:37.945Z · score: 3 (2 votes) · LW · GW

I appreciate that you outlined what predictions are made from the Solow model applied to software. Do you know of any other models that might be applied?

comment by johnswentworth · 2019-04-10T05:32:28.567Z · score: 2 (1 votes) · LW · GW

Yeah, I used to have conversations like this all the time with my boss. Practically anything in an economics textbook can be applied to management in general, and software engineering in particular, with a little thought. Price theory and coordination games were most relevant to my previous job (at a mortgage-tech startup).

comment by ryan_b · 2019-04-11T20:27:29.585Z · score: 2 (1 votes) · LW · GW

This provides a much more nuanced background for decisions about making investments with a longer expected payoff.

I have in mind here examples like model checking, such as with TLA+, and choosing languages like LISP or Haskell over Java or Python. Often these things are viewed by management as a question of whether to get to market sooner or later, with vague promises of correctness and maintainability if they accept the delay.

Putting the conversation on the capital footing makes it a lot easier to have these kinds of discussions, including from a testing standpoint - it is much clearer to me now how I might approach trying to tell what kind of benefit a project would get out of model checking or different language paradigms. In particular, I think 'ease of implementing a new feature in mature software' is an interesting one that I never considered explicitly before.