What to Consider Before Embarking on a Rewrite

The itch to rewrite a piece of software from scratch can be one that’s hard to ignore. However, as many developers will learn throughout their career, rewrites can be tricky to pull off successfully because of the sheer amount of complexity and effort involved. If not executed correctly or done for the wrong reasons, then it will result in a loss or little to no gain if you're lucky. Thus, before embarking on that big rewrite it’s important to consider why it’s required and how we can increase the likelihood of success.

Let's take a look at some of the steps we can take throughout the decision making process. Typically, software engineers will be the first ones to raise alarm because they're close to the source. After that happens it’s important to investigate why. Intuition is a good indicator that something is up, but it doesn't necessarily point us to the root cause or the appropriate solution.

When we feel like a piece of software is becoming hard to deal with, it can be tempting to jump to conclusions and say that it has to be rewritten from scratch in a shiny new technology. This is like the example from "Thinking, Fast and Slow", where an investor picked a Ford stocks because he was impressed by their car show. The reason is that they had a tough question in front of them: "Should I invest in Ford stuck?", but subconsciously answered the easier question: "Do I like Ford cars?". We all have that tendency, so we have to make sure we're asking the right questions, and not the easy one about which technology we like.

Therefore, before we start prescribing a solution, we must make sure we have a good understanding of the problem. What is it that we’re trying to fix here?

The proper reasons for requiring a change are roughly going to fall into one of two categories. Either it’s something needed in service of our users and that helps us with company objectives, or it has a big enough impact on developer productivity and happiness to make it worth our while.

If it's something related to users or product, then it's likely a bit more obvious when there's an issue. When we look at metrics like, for example, the performance, weekly bug counts, incidents, and mean time to restore, and we can't find any other explanation than application related problems, then it's pretty obvious we should fix something in the product. This doesn't necessarily mean a rewrite is in order — we'll investigate that question later — but it does mean something has to improve.

Now, if we’re talking about issues related to developer productivity, then humans are involved so the topic becomes quite a bit more complex. We can still look at metrics here, but it’s important to realize that when they relate to human behavior, the picture the metrics alone can paint will not be black-and-white. There’s going to be some gray area, and we'll need to interpret everything in context.
For example, if the cycle time is trending upwards, we should rule out other causes, like how the team breaks up stories, before concluding that it's caused by to technical debt. If the team is healthy and has been stable for a while, then determining whether it's a tech or process issue should be pretty easy.

Once we've identified the category that the problem falls in, have gotten a more objective view of the issue, and determined that it is indeed a technological problem, then we can look at the harder part of the decision, which is figuring out whether a rewrite is appropriate and whether the cost is acceptable.

You may have heard that Google rewrites most of its software every few years, but you're unlikely to be a Google-scale company with more than 20.000 engineers and billions of dollars in profits.

The reality is that rewrites are both costly and risky. The original code has been battle-tested for what may have been years. When we decide to rewrite, all that complexity will resurface and will have to be re-implemented. Knowing that, we can assume there will be surprises during development, which will cause us to take longer to get to parity, which means it will take longer before we can do any new product improvements. On top of that, things could happen during that period that cause the company to shift focus and if we're caught up in the middle of a rewrite, then we may need to shelve it. That could mean losing months of work.

For those reasons, it's preferable to go for an incremental approach where possible. With this approach we try identify the biggest pain points and incrementally start chipping away at them. That way, we'll start providing value much faster than a rewrite ever would. Additionally, it's less risky, because we can change direction in between the increments and won't lose a huge amount of work if we have to drop something.

In many cases it's also difficult to answer the question: Is the cost worth the gain? There's no easy answer for this. The best we can do is make a reasonable estimate based on a cost-benefit analysis. Doing this in some quantifiable way is good, as long as we are aware that it can never be completely accurate and only serves as a rough estimate. For any significantly complex piece of software, there will always be unintended consequences and we should be prepared to deal with those.

We've already seen that we tend to underestimate the cost, and while doing our analysis we should keep in mind that we often overestimate the benefits of switching technologies too. "Hype is the plague on the house of software" as Robert Glass put it in Facts and Fallacies of Software Engineering. Changes in tools and technologies typically only result in a 5 to 30 percent increase in productivity and quality, and that's after the initial learning curve during which productivity is lower. So while there are definitely cases where technology changes are warranted, we should be aware of biases to estimate the costs and benefits incorrectly. Rewriting a React app to Vue probably won't benefit you much, but if you are still hiring COBOL developers, it might be worth thinking about incrementally replacing that system with something more modern to improve scalability and maintenance costs.

Now, once we've truly exhausted all other options, and are certain that the benefits will outweigh the costs, we can conclude that a rewrite is in order and be reasonably confident that it's the right solution.

DIV Games Studio

A few months ago, I discovered that an old website, which I used to visit during my childhood days, was back up again. This triggered a bit of nostalgia in me about my experience of learning to code as a kid and how it influenced my life.

“DIV Arena” might not mean much to you, but I’m sure that for quite a few people it was an important part of their life. It used to be the main website where people would upload and discuss games that they made via a program called “DIV Games Studio”.

MikeDX restarted div-arena.co.uk again in 2016

MikeDX restarted div-arena.co.uk again in 2016

Like many kids, I used to play a lot of computer games, especially Total Annihilation. That game got me to spend so much time on the computer that it drove my dad crazy. He looked at it as an addiction, probably rightfully, and tried to lure me away from it in creative ways, but only with limited success. I think that eventually he decided to change his approach with the goal of getting me to spend my computer time more productively. That’s when he bought me DIV Games Studio.

As I’ve hinted at above, this was essentially a desktop environment optimised for making your own computer games. It offered a code editor, a graphics drawing program and a programming language, all intended to lower the barrier and teach people how to build games.

A screenshot of the DIV Games Studio desktop environment

A screenshot of the DIV Games Studio desktop environment

I was super excited to start making games, so I tried it out immediately, but I was also twelve and didn’t have a clue about how programming worked. The first time I just started typing a story in the bright blue terminal, thinking that would somehow be translated into a game with graphics and everything.

Disappointed when that didn’t work I asked my dad for help again. He kindly told me that’s not quite how it works. If I wanted to learn to make a video game, I would have to read the manual and learn the programming language. Unfortunately, he couldn’t help me either, because while he did learn some Pascal at some point, that was many many years ago. So I set out to learn programming on my own.

Luckily, DIV came with a manual and a tutorial on how to build a game similar to Space Invaders. The tutorial was reasonably easy to follow, and I was able to build my first game. It definitely took more patience than I’d ever needed before, but I did it and it was satisfying. After that, it became more difficult though, because now there was no more tutorial to guide me. If I wanted to do something new, I was on my own with a dial-up internet connection and a pre-StackOverflow internet. That was hard, and I remember giving up many times when I got stuck, but somehow I always kept coming back and I often found solutions via div-arena.co.uk’s discussion forums. It taught me how to persevere more and how satisfying it could be to build something from scratch while solving all kinds of technical mysteries along the way.

I built a few more simple games, but eventually I kind of lost momentum and didn’t program much for a few years. It’s only when I was 19, while I was studying something not-so-programming-related, that I realized this is what I wanted to do.

Either way, I have a lot to thank DIV Games Studio for, because once I picked up programming again, it was a lot easier for me to get started.

The Software Muscle

If you apply small, but relatively frequent stress to a muscle, then over time it will adapt and grow stronger. We don’t frequently talk about it in the context of software development, but I believe it holds here too. By frequently applying small amounts of stress to our software and processes, we can greatly improve them over time.

If you look at the practices that exist today, you’ll find plenty that follow this principle. When doing continuous deployment, we deploy automatically on a frequent basis, often even on every commit to master. Each deployment applies a little bit of stress. And sometimes… things will break! But that danger is what helps us make our process more resilient. Would you rather deploy a huge change once a year, with the risk of a massive downtime that's hard to recover from? Or, do you deploy small changes, many times per day, with each potential issue helping you learn more about your deployment process? I know which one I prefer.

Of course, at first, switching to this flow can be pretty scary, and probably for a good reason. There are many things that can go wrong if you aren’t used to deploying that frequently. If you only workout once a year, you may get hurt if you’re careless. However, once you commit to moving towards more frequent deploys, you’ll learn to tackle these issues one by one. Note that you don’t have to deploy on every commit from day one, you could just deploy once per workday or per week initially. And, of course, depending on your context, there might be constraints which prevent you from reaching the “deploy-every-commit” state, but the principle still holds and it’s still valuable to deploy as frequently as you can afford.

I see another application of the principle of “software as a muscle” in services that automatically update your dependencies, like Dependabot or Depfu. These services watch your dependencies for new releases and then automatically make a pull request with an attached changelog. As a result, we end up with frequent and smaller dependency updates, instead of the semi-regular “Let’s update all the libraries and see what breaks”. Again, there are more frequent smaller stresses, and something may break occasionally, but this helps us improve over time. Our software stays up-to-date and will be more enjoyable to work with, given the more up-to-date dependencies.

Similarly, I see the idea at play in the Chaos Monkey tool built by Netflix. Quoting their README file:

Chaos Monkey randomly terminates virtual machine instances and containers that run inside of your production environment. Exposing engineers to failures more frequently incentivizes them to build resilient services.

Finally, I believe it’s also one of the principles that makes Test-Driven Development valuable. By adding tests to our code, we apply a bit of stress to it, which verifies that the code behaves as we expect it to. When we apply changes to the code in the future, those tests again will verify whether we are not breaking anything.

If you see any other places in software were this applies, please let me know! I’d love to hear from you.

Short iterations

Sometimes it’s easy to forgot the value behind short iterations, so let’s discuss why they’re important.

While it’s sometimes true that splitting a big change into smaller pieces can be hard, it’s still worth it. Whenever you’re going through a long iteration, you aren’t getting feedback. You’re working in a bubble, and you might be running headfirst into a wall.

Short iterations are important because the iteration is your feedback loop. Once you finish an iteration you can get feedback on it. Getting it quickly is important, because it allows you to adjust at a lower value stage. Changing direction early is always easier than later.

The feedback comes from multiple sources by the way. You can get feedback from your colleagues, from tests, from seeing an idea work in production, from users or many other ways. Some issues and improvements are hard to predict, and feedback helps surface those quicker. That’s why short iterations are important. Feedback and adjusting based on that feedback is the goal.

This post was originally published on Medium on Nov 26, 2015.