A Taxonomy of Tech Debt

(https://technology.riotgames.com/ “Riot Games Tech”)

Summary

When measuring a piece of tech debt, you can use impact (to customers and to developers), fix cost (time and risk), and contagion.

Contagion can be a developer’s worst enemy as a problem burrows in and becomes harder and harder to dislodge.

It is possible, however, to turn contagion into a weapon by making your fix more contagious than the problem.

Metrics

In order to make good decisions about what problems to fix now and what to fix eventually (or, realistically, never), we need a way to measure a particular piece of tech debt. I’ve identified 3 major axes to evaluate on: impact, fix cost, and contagion.

Impact

The first axis is the most obvious: the impact of the debt.

This takes the form of player-facing issues (bugs, missing features, unexpected behavior), and developer-facing issues (slower implementation, workflow issues, random useless shit to remember).

Fix Cost

The second axis has to do with the cost to fix the tech debt.

If we decide to fix an issue in our code or data, it will require someone’s measurable time to fix. If it’s a deeply rooted assumption that affects every line of code in the game, it may take weeks or months of engineering time. If it’s a dumb error in a single function, it may be fixable in a matter of minutes. Regardless of the time to implement a fix, though, we also must consider the risk of actually deploying that fix.

Contagion

The third axis is something I’ve become obsessed with: contagion.

If this tech debt is allowed to continue to exist, how much will it spread? That spreading can result from other systems interfacing with the afflicted system, from copy-pasting data built on top of the system, or from influencing the way other engineers will choose to implement new features.

If a piece of tech debt is well-contained, the cost to fix it later compared to now is basically identical. You can weigh how much impact it has today when determining when a fix makes sense.

If, on the other hand, a piece of tech debt is highly contagious, it will steadily become harder and harder to fix.

What’s particularly gross about contagious tech debt is that its impact tends to increase as more and more systems become infected by the technical compromise at its core.

Types of Debt

Now that we have a framework for measuring a particular piece of tech debt, let’s talk about some broad categories of tech debt:

Local Debt

Local debt resembles the classic “black box” model of programming. As far as the rest of the game is concerned, the local system (spell, network layer, script engine) works pretty reliably.

No one needs to keep the debt in mind as they develop around the system. But if anyone opens the lid and looks inside, they’ll be horrified, disgusted, or completely confused by what they see.

MacGyver Debt

MacGyver debt is named after the TV show from the mid 80s. Angus MacGyver would solve problems using his swiss army knife, duct tape, and whatever else was on hand.

His solutions often involved attaching two unlikely pieces; in the context of tech debt, this means two conflicting systems are “duct-taped” together at their interface points throughout the codebase.

Foundational Debt

Foundational debt is when some assumption lies deep in the heart of your system and has been baked into the way the entire thing works. Foundational debt is sometimes hard to recognize for experienced users of a system because it’s seen as “just the way it is.”

A hilariously stupid piece of real world foundational debt is the measurement system referred to as United States Customary Units. Having grown up in the US, my brain is filled with useless conversions, like that 5,280 feet are in a mile, and 2 pints are in a quart, while 4 quarts are in a gallon. The US government has considered switching to metric multiple times, but we remain one of seven countries that haven’t adopted Système International as the official measurement system. This debt is baked into road signs, recipes, elementary schools, and human minds.

Data Debt

Data debt starts with a piece of tech debt from one of the other categories.

Perhaps it’s a bug in the scripting system, a less-than-desirable file format for items, or two systems that don’t play very well with each other. But then a ton of content (art, scripts, sounds, etc.) gets built on top of that code deficiency. Before too long, fixing the initial tech debt becomes extremely risky and it becomes painfully hard to tell what you’ll break if you try to fix anything.

My favorite real world example for understanding data debt is DNA. The genome of an organism is slowly built up over millions of years through lossy copies (mutations), transcription errors, and evolutionary pressure. Some copy errors are useless but benign, others are harmful, and others confer powerful advantages. Attempting to figure out what any piece of DNA actually does is incredibly difficult. We fully understand what the base pairs mean, and how sets of base pairs translate into amino acids for protein construction. We are even starting to understand more about some non-encoding roles that DNA can play. But in the 3 billion plus base pairs of the human genome, there’s still so much we don’t even remotely understand. Radiolab’s episode about CRISPR highlights one such puzzle that was recently cracked.