In the previous posts in this series, we saw what microservices are and how to start the journey towards broken out services from the monolithic application.
In the second post, I talked about not having get calls across microservices. On that, I received some questions. Here's one:
"Nice article on microservices. But i did not get reasoning behind
"You shouldn't make a query/get call to another microservice."
My reply to that is below:
As a service, you should have your own store for the data you need. If you don't, what happens if the service you're using for get calls goes down for an extended period of time? You are blocked. Your function that depends on it fails now. What if you have multiple of these? You just increased your reasons to fail by that factor.
To explain this further, too many blocking get calls may result into a system that can get completely stalled. What if the individual calls that are happening have different SLAs? Do we block the entire request for the slowest call?
One of the projects I worked on in the past had these types of calls spread through out the system. In the worst cases, the http requests took well over a minute to return. Is this acceptable?
To counter this, the team went towards caching the entire relational data stores. Pulling all that information down became very tricky. The data sync was happening once a month. The sync job itself took days to finish, pushing the changes further back. I have seen many project teams going towards this type of solution and eventually running into the scale problem. Then they settle on syncing the data using messaging.
So, what is the trade off? I explained it below:
The data duplication that may come with this is an acceptable trade off for scale. In other words, you are giving up some consistency and moving towards eventual consistency for scale and availability.
One of the teams I worked with did a good job differentiating the types of messages. Some of the changes had to be pushed forward at a higher priority, some required a little more processing, etc. We built different routes for those messages so that we could maintain our SLA for these messages. The data was still cached but we were able to get rid of the sync jobs that were trying to sync all the data at once. Messaging enables different kinds of patterns. I recommend Enterprise Integration Patterns book for those patterns. It goes deep into the different kinds of messaging patterns.
I used this example afterwards:
For a banking app, you can say, there are $100 in your account as of the last sync time instead of $0 because you can't hit that service. People are much more likely to freak out in the later scenario than the one before. This method scales too since we have reduced the consistency level to eventual. It sounds radical but you are trading in strong consistency for scale and user experience. I hope that clears things up for you. I think I should write another post on this :)
I went on further to express the points below:
Another reason is network. You can't take it for granted. In any of these situations, you don't want your customers to get affected by this implementation detail. The way to avoid this is by building your own store beforehand and keep it synced through messaging.
You don't have a distributed system if the system is not resilient enough to consider the network loss. This is something that needs to be considered on day one. Your customers are not going to care that your network failed. They want your app to work.
If your system needs too much of a cross service chatter then it could be a sign of wrong system boundaries. It means that the service is just too fine grained. Your services could be suffering from the nanoservices anti pattern. It is a subtle problem. The SOA patterns book goes into the details of that. Some of the services may require merging.
The Netflix approach
They go the hybrid route. They hit the local cache store first, if it is not available, they fall back on cross boundary calls. Josh Evans explains that very eloquently in this talk.
To summarize, we are giving up some consistency for autonomy. We can solve the consistency problem using data pumps or sync jobs easily but autonomy has to remain as strong as possible. Without autonomy, you won't be able to materialize any benefits of moving towards microservices architecture.