How to avoid Big Bang Deployment

8 min readJan 4, 2021

Have you ever forgotten a DDL script to be applied in the production environment before your service deployment and the service deployment failed?

Have you ever done a deployment that didn’t work and you had to rollback and it was easier to fix the problem than to rollback the deployment?

Have you ever be afraid of deploying a list of new services into the production environment because it was too much change?

If any of the answers is yes, you probably did a Big Bang Deployment.

Facing the problem

When I was in one of my first companies, we usually deploy new features at night to decrease the risk of unavailability to our customers. One day, after a new feature deployment, everything suddenly stopped working.

Thereafter, we started to troubleshoot the problem but we couldn’t figure out what was happening, it was just like everything was not working correctly, even services that we didn’t touch. A few minutes went by and we decided to rollback that feature. After the rollback everything still didn’t work.

Finally, we restarted the Application Server and everything came back to normal. We didn’t understand what happened. We ended up that night without our feature, stressed, and with a considerable unavailability time.

On the next day, we figured out that the order that we deployed the services was incorrect, it crashed a critical service and it cascades the problem to the others. That night we tried to deliver a couple of services such as backend, frontend, database, enterprise service bus & orchestration. If it was not deployed in the right order it will not work.

Why did we let it go to that point?

The natural behavior

The big bang deployment is the natural process of developing a new feature, it is something like:

We discuss a new feature to be developed with the product manager, business people, and other software engineers
We identify the services and/or applications we need to change
We identify the tests we need to do to make sure everything will work fine
We implement all the components and services we need to change
We do end to end tests to make sure everything is good
We deploy it as it is needed

It looks simple and it usually works for some of the features that we need to implement. The problem only happens when the feature is not isolated and it needs several changes like frontend, backend, database, and other services or components.

Now stop and think about the last features you just did last week and how many were isolated?

Almost every feature needs more than one component to change and this natural process brings a lot of problems to those features and their deployment process.

The big bang deployment problems

There are some problems with the previous simple workflow.

It assumes all bugs will be caught in the development and test step. Have you ever delivered any tested feature and when you validated it in the production environment you found a new bug? I hope you usually validate your features and I’m sure you found a new bug someday in your career.
It increases the complexity and coordinator effort. Changes are dependent on each other. Most of the developers that I know don’t like to coordinate things with other people, therefore this coordination effort is neglected, causing problems like we just saw related to the order of the deployment.
It takes a long time to test and a long time to receive feedback from peers. If we code the whole feature before sending the PR or testing it, there will be some issues that we could have fixed early and it will probably generate bugs and rework. Inefficiency.
Normally they are large pull requests. Again, if we code the whole feature how many lines do you think your team member will need to review and how long is it gonna take? Don’t be the guy that sends a pull request with hundreds of lines and asks it to be done as fast as possible.
It has unavailability due to steps deployments and incompatibility of the previous version. Here is the main point, incompatibility is what causes the issues, if we do everything at once we usually neglect the compatibility of the individual parts what causes the necessity to deploy everything at once.

Most of the engineers like to focus on coding and they are not trained to avoid risks, that’s why the natural process is the big bang deployment.

After all, avoiding big bang deployment is all about avoiding risks.

Depending on the criticality of your system, if it is not critical, there is a high chance that you do big bang deployment over and over again and everything works fine. So why do you would rather avoid big bang deployment?

If you want to avoid the risk of being unavailable or you want to remove the coordinator effort to deploy new features I would recommend you to avoid big-bang deployment. Otherwise, you can continue to work that way.

Stepwise deployment or incremental deployment

I will call stepwise deployment the opposite of the big bang deployment and I will describe some techniques that can help those who would rather apply the stepwise deployment.

Have a vision of the whole feature before start any code.

To remove the temptation to start coding after discussing and identifying the services and components to change, separate a time to imagine the whole feature, think about the communication between the services, and ask yourself some questions.

What if some component became unavailable what will happen to other components?
Is there another service or component that relies on one of those components that we will change?
Is it possible to versioning those changes to deliver them independently?

It will help you plan the changes before starting it.

Slice it into small pieces

Sometimes an individual component will have more than a function, method or API to change, try to separate those tasks to think about them independently. Check if you can make those changes independently of the whole feature.

For example, recently in one of our payment products, we are considering creating a feature that will allow customers to pay more than one time per day. For some reason, it wasn't allowed before. This is a huge change and it covers several components and services.

To enable it we had to change the database table to support more than one payment record per day, we isolated that task and did it independently. We had to create new fields that will be temporarily null and when the whole feature is done we will change it to not null.

Sometimes we will have an “inconsistent” state for our service or component and it will look weird, but it is better than assuming the risk of making changes tied to each other.

Your deployment is not your release

Slightly different but a very important one.

The deployment is the acting of delivering new code, it can sometimes be unavailable to the costumers or to the components that will use it in the future, like the new column we added in the last example.

The release is the acting of let new features available to costumers. Sometimes it doesn’t include deployments.

That’s why I don’t like product managers or engineer managers approvals for the deployment of new code, sometimes it is still not a completely new feature, it is just code that will enable new features.

Use feature flags like Launch Darkly to toggle on/off new features to the customers. Or just use properties or databases to enable features to customers. Separate deployment of releases.

Use parallel change to do incremental code

Parallel change works for every size of the code you are changing. If you are going to change an API to have a new required field, do the following steps.

Add this new field as an optional one
Change the client components that use this API
Change the field to required in the API

Each step should be done independently and it should be delivered to the production environment independently, you can do it even if you have just one client component using this API. Why not? We usually change the client component and the API together and it brings us all the problems we mentioned before.

If the API is public and you don’t have the control you can create another version of the API and ask your clients to switch to this new version with a pre-defined deadline.

I know most of the engineers know that, but I’ve seen a lot of engineers just changing both client component and services anyway, as I said sometimes you think it isn’t worth avoiding the risks. That’s ok, I personally prefer to avoid those risks.

Smaller changes

Type of logic can be applied also to smaller changes, for example, if you need to change a method to add a new required field you can just overload that method, a lot of languages support it. Given that example in Java.

If we want to change the parameters of the method sendMessage from the author, message, and datetime to a better object we can just add a new method instead of changing this one.

After deploying it in the production environment we can change all the clients of this method, and finally, we can remove the old one.

If you are in a more agile company, I strongly recommend you to do it. It will help you to think more about incremental changes.

Conclusion

Those techniques are not everything you need to know about stepwise or incremental deployment. But I hope you can start to think about how to deliver code independently and what risks are you taking when you decide to move on with big bang deployment.

The key point is creating code that can go to the production environment whenever you want.

When you are coding your next feature, before commit and push your changes to the main branch, ask yourself if that change can go directly to prod without side effects. If the answer is yes, there is a high chance that this commit is independent and you are in the right direction.