Tackling a monolith - where to begin?
In the previous post, we saw what microservices are. We also looked at how they differ from the big monolithic apps.
In the startup/earlier days, you did a file => new project in Visual Studio or your favorite IDE, designed some perfectly normalized SQL database, brought in an ORM on top of it for reasons. It worked great for a while. You brought in teams of people to do heads down development. The app has grown in size along with the user base and the org. With that, you are running into problems that you didn't anticipate before. The application is feeling slower, the complaints of timeout are coming in and bug reports are increasing. The backlog of new features is starting to look bigger. The tech debt is out of control. Engineering teams are frustrated at best because they know there are problems but their hands are tied. They have pushed some of the processing to the scheduled jobs but now those jobs are running into problems. The app is feeling like a giant monolith. The business functions are vertical in nature but the supporting IT teams and solutions are cross-cutting. This has increased coupling and complexity of the system. Even smaller changes take huge app deployment lasting hours which scares everyone. The delivery to production takes longer than ever, adding to the frustration. The current infrastructure has been pushed beyond its limits. The question of scaling further is very real. Breaking this app makes sense based on all the material out there but where to begin? How do we create microservices out of this while delivering features that keep the business viable? Tackling up the monolith seems like a necessary step but how do we justify the costs for breaking it up? How do we convince the business that this is not just another refactoring effort from the engineering team? Very real questions! This is a very common scenario these days. Where do we begin when El Capitan of a monolith is staring you right in the face?
I have been there. Now, let's take a look at some of the ways.
Take clues from refactoring practices
Over a period of time, all codebases gain some tech debts.There are times when you deliberately punt on problems. The technology changes, the nature of problems is never static and yesterday's solution becomes obsolete. One approach I've taken is to refactor code just a little bit at a time, just around the area that your new feature code is going to touch. Of course you'd need to co-ordinate this with your pull request reviewers so that they don't get surprised by more code changes than necessary. This makes your code a little better today than yesterday. Since it is incremental and scoped to the small area, this can become the team's habit in a short amount of time. As opposed to this, I was never able to sell big bang refactoring to the business or project management. Coincidently, I found this supporting post from Steve Smith essentially stating the same thing. So, when it comes to tackling a monolith, it is essential to take an incremental approach instead of a big bang one.
Business Alignment
One other step that can be done in parallel is to really understand the business domain from the perspectives of bounded contexts. We can start isolating the business processes in a way that they don’t leave their boundaries. This is not a trivial task to undertake. Often times, it involves a lot of different people from the org and knowledge of the existing processes. If the app is constructed in vertical slices instead of layers as shown by Jimmy Bogard in this talk, we can consider that our app is ahead of the game.
One reason we want to do this is to understand the misalignment that has happened throughout the organization in terms of teams and functions by horizontal layering. The cross-cutting services for everything may seem appealing at first because of the code reuse, upfront time savings and the costs but it insidiously increases the tight coupling. That happens because the reason for change is not considered. Over a period of time, the gap increase by a natural progression of business processes and the horizontal structure of supporting teams and applications. This leads to too much of cross communication between services. If they are disjointed services, that could end up making the system incredibly chatty.
Another big purpose behind this is to get the organizational buy-in into this type of project. It is important for business to understand why vertical slicing is necessary, how much does it cost currently to deliver a piece of code, how this can directly improve the delivery and the possibilities it opens up from the end user’s perspective. It can also show them places to improve efficiencies by cutting down redundancies.
After identifying these areas, we want to establish the communication patterns between different pieces and start establishing service layer agreements for those. For example, when a user buys an item on an ecommerce site, do we show them a spinner until it gets shipped or is it ok to send them an email saying we will send you updates as the order goes through the different steps of processing.
Choosing the area
It is very important to have monitoring in place for a variety of reasons. In this case, we can generate a heat map to identify which parts of the app are heavily used over the others. This can help us setting some weights on these areas. Highly used areas and the slowest performing could be a good starting place.
One other thing to consider is how many new features are coming down the pipe and if the system is no longer in active development. If there is too much happening at once, changing architecture may delay the delivery. The real answer here is, it depends on the type of changes and how much time we can buy from the business. If the system is no longer under active development, it could be a candidate for retirement or get isolated enough for near future retirement. This is not a good candidate.
Remote procedure calls
This is another area that comes in the way of scaling the apps. You don't want to create an accidental DDOS attack on someone's service because you spawn hundreds of your container instances and you call them directly in your process. Also, the network shouldn't be taken for granted. This is another good area to start analysis right away. Eliminating the read RPC calls may not be a trivial task but identifying this is an important step. Our app is not going to scale very well if we are hardcoding the calls to other services. Service discovery with Consul can be helpful. We will see the details on Consul in the later posts.
Shared relational databases
One of the problematic and yet very common forms of integration patterns is the shared relational database. It creates coupling that becomes very hard to remove. Once we start introducing foreign keys into it, it becomes even harder to scale. Is it fair to say the degree of normalization is inversely proportional to the ability to scale the database? Even if some magical indexing strategy makes the calls very responsive, its maintenance is cumbersome at best. I'd recommend this talk by Pat Helland. He has also written some papers in this area. Knowing this, we can start identifying some of the most heavily used tables, their relationships and how applications are using them. This sort of analysis can give teams ideas about reducing the dependencies on these tables and setting them up for the complete break up in the future.
Containers
Before jumping too much into the container world, you will have to increase the organizational awareness and education around them. This is especially true in the Windows environments since most of the containers come from the Linux world. How do we leverage these? One place to try them out is in your build and test systems as mentioned in this .Net rocks episode.The risk is lower. The faster build and test pipeline can help in delivering the software quicker.
Parting words
All of these can be done, in parallel, with one goal in mind of identifying areas that could come in the way of breaking up a monolith. After identification, the app areas can slowly start making changes for some immediate cheap wins. I hope this helps.