Idempotent receivers using message store

In Microservices/Distributed systems, messaging is a preferred form of integration. I covered some of the reasons in one of my previous posts. In these type of systems, whether we have pub-sub/event notification/sending commands, it is a good practice to make the receiving endpoints idempotent. It simply means the endpoint can receive the same message multiple times.

One way of ensuring this is by designing this message body itself. For example, in an accounting system, instead of sending a message to deduct the withdrawn amount, we can send the resulting total. To explain further, if I had $100 in my account and I took $20 out, sending a message of setting the total to $80 would be naturally more idempotent than sending the deduct by $20 message.

However, there are times when this is not possible. In the simple accounting system described above, to send a message of setting $80, the sender needs to know the current state of the destination. This can bring in a whole new set of complexities. What if this is a pub-sub/event notification system? Are we going to keep track of everything happening in subscribers? Even if this was a command, we need to be sure of the current reality of the destination. This means increased coupling between the source and the destination. We are also increasing the importance of the order of messages and the bugs around this are never fun to resolve.
Inadvertently, we could be pushing this distributed system to be more consistent and that is always challenging given several explanations of CAP theorem. I highly recommend reading everything around this topic from designing data-intensive applications.

So, it sounds easy to say, "just send set-the-total-to-$80 message" but the reality is often a little more complex than that especially when we need to have a system more available than consistent or if we are dealing with event notification system with many subscribers. In these situations, it could be more practical to send a deduct-by-$20 message. Now, what?

In my system, I am working with NServiceBus with RabbitMQ as a transport. NServiceBus provides Outbox functionality. It looks good, but it comes with some caveats.

Because the Outbox uses a single (non-distributed) database transaction to store all data, the business data and Outbox storage must exist in the same database.

I can't guarantee this in my system. Also, what if I want to use a NoSQL store for this?

The Outbox feature works only for messages sent from NServiceBus message handlers.

This is very limiting. RabbitMQ is a very capable queuing system. It comes with very easy to use clients and enqueuing a message directly is common.

In comes the simple message store approach. We can store processed messages. In my case, my model looks like below:

public class MessageStore
	public Guid MessageId { get; set; }
	public Guid GenericIdentifier { get; set; }
	public DateTime TimeStamp { get; set; }
	public string MessageType { get; set; }

I can use any type of data store for this. I used SQL Server.

Now, in my NServiceBus message handlers, I can access this store as needed. To check if the message was already processed, I can do a simple check based on the message Id. This can happen for several reasons. For example, We can fetch a message out of RabbitMQ queue, process it but positive ack just never reaches back to the cluster because of a network blip.

Often, especially in data update scenarios, we want to discard a message if we have a more recent update processed. This can happen if the message is going through a retry logic and before it comes up again, another more recent update arrives and gets processed. In this situation, I can do a check like below:

var topProcessedMessage = dbContext.MessageStore.Where(m =>     m.MessageType.Equals(nameof(MyMessage)) && m.GenericIdentifier.Equals(message.MyRecordIdentifier))
.OrderByDescending(m => m.TimeStamp).FirstOrDefault();

I can compare the Timestamp of that message against the incoming message and discard the message if it has the Timestamp earlier than the stored one. Otherwise, process the message and store the necessary values in the MessageStore.

This way I can worry about what's relevant for my subscriber/handler. I can easily discard processed and stale messages. If multiple messages are going to make updates to the same entity, I won’t have to clutter it with many different types of LastUpdated Timestamp columns. This store can be easily expanded to store serialized messages, their checksums, etc. As we are not adding this to a pipeline, we are keeping this behavior optional and not applying by default to all the handlers.

The downside of this approach, is of course, the store can get out of hand quickly from a number of records perspective. We will see how to handle this situation in the next post.

In Microservices/Distributed systems, messaging is a preferred form of integration. I covered some of the reasons in one of my previous posts. In these type of systems, whether we have pub-sub/event notification/sending commands, it is a good practice to make the receiving endpoints idempotent. It simply means…

Read More

Why should you avoid get calls across microservices?

In the previous posts in this series, we saw what microservices are and how to start the journey towards broken out services from the monolithic application.

In the second post, I talked about not having get calls across microservices. On that, I received some questions. Here's one:

"Nice article on microservices.
But i did not get reasoning behind
"You shouldn't make a query/get call to another microservice."

My reply to that is below:

As a service, you should have your own store for the data you need. If you don't, what happens if the service you're using for get calls goes down for an extended period of time? You are blocked. Your function that depends on it fails now. What if you have multiple of these? You just increased your reasons to fail by that factor.

To explain this further, too many blocking get calls may result into a system that can get completely stalled. What if the individual calls that are happening have different SLAs? Do we block the entire request for the slowest call?

One of the projects I worked on in the past had these types of calls spread through out the system. In the worst cases, the http requests took well over a minute to return. Is this acceptable?

To counter this, the team went towards caching the entire relational data stores. Pulling all that information down became very tricky. The data sync was happening once a month. The sync job itself took days to finish, pushing the changes further back. I have seen many project teams going towards this type of solution and eventually running into the scale problem. Then they settle on syncing the data using messaging.

So, what is the trade off? I explained it below:

The data duplication that may come with this is an acceptable trade off for scale. In other words, you are giving up some consistency and moving towards eventual consistency for scale and availability.

One of the teams I worked with did a good job differentiating the types of messages. Some of the changes had to be pushed forward at a higher priority, some required a little more processing, etc. We built different routes for those messages so that we could maintain our SLA for these messages. The data was still cached but we were able to get rid of the sync jobs that were trying to sync all the data at once. Messaging enables different kinds of patterns. I recommend Enterprise Integration Patterns book for those patterns. It goes deep into the different kinds of messaging patterns.

I used this example afterwards:

For a banking app, you can say, there are $100 in your account as of the last sync time instead of $0 because you can't hit that service. People are much more likely to freak out in the later scenario than the one before. This method scales too since we have reduced the consistency level to eventual. It sounds radical but you are trading in strong consistency for scale and user experience. I hope that clears things up for you. I think I should write another post on this :)

I went on further to express the points below:

Another reason is network. You can't take it for granted. In any of these situations, you don't want your customers to get affected by this implementation detail. The way to avoid this is by building your own store beforehand and keep it synced through messaging.

You don't have a distributed system if the system is not resilient enough to consider the network loss. This is something that needs to be considered on day one. Your customers are not going to care that your network failed. They want your app to work.

If your system needs too much of a cross service chatter then it could be a sign of wrong system boundaries. It means that the service is just too fine grained. Your services could be suffering from the nanoservices anti pattern. It is a subtle problem. The SOA patterns book goes into the details of that. Some of the services may require merging.

The Netflix approach

They go the hybrid route. They hit the local cache store first, if it is not available, they fall back on cross boundary calls. Josh Evans explains that very eloquently in this talk.

To summarize, we are giving up some consistency for autonomy. We can solve the consistency problem using data pumps or sync jobs easily but autonomy has to remain as strong as possible. Without autonomy, you won't be able to materialize any benefits of moving towards microservices architecture.

In the previous posts in this series, we saw what microservices are and how to start the journey towards broken out services from the monolithic application. In the second post, I talked about not having get calls across microservices. On that, I received some questions. Here's one: "Nice article…

Read More

Tackling a monolith - where to begin?

In the previous post, we saw what microservices are. We also looked at how they differ from the big monolithic apps.

In the startup/earlier days, you did a file => new project in Visual Studio or your favorite IDE, designed some perfectly normalized SQL database, brought in an ORM on top of it for reasons. It worked great for a while. You brought in teams of people to do heads down development. The app has grown in size along with the user base and the org. With that, you are running into problems that you didn't anticipate before. The application is feeling slower, the complaints of timeout are coming in and bug reports are increasing. The backlog of new features is starting to look bigger. The tech debt is out of control. Engineering teams are frustrated at best because they know there are problems but their hands are tied. They have pushed some of the processing to the scheduled jobs but now those jobs are running into problems. The app is feeling like a giant monolith. The business functions are vertical in nature but the supporting IT teams and solutions are cross-cutting. This has increased coupling and complexity of the system. Even smaller changes take huge app deployment lasting hours which scares everyone. The delivery to production takes longer than ever, adding to the frustration. The current infrastructure has been pushed beyond its limits. The question of scaling further is very real. Breaking this app makes sense based on all the material out there but where to begin? How do we create microservices out of this while delivering features that keep the business viable? Tackling up the monolith seems like a necessary step but how do we justify the costs for breaking it up? How do we convince the business that this is not just another refactoring effort from the engineering team? Very real questions! This is a very common scenario these days. Where do we begin when El Capitan of a monolith is staring you right in the face?

I have been there. Now, let's take a look at some of the ways.

Take clues from refactoring practices

Over a period of time, all codebases gain some tech debts.There are times when you deliberately punt on problems. The technology changes, the nature of problems is never static and yesterday's solution becomes obsolete. One approach I've taken is to refactor code just a little bit at a time, just around the area that your new feature code is going to touch. Of course you'd need to co-ordinate this with your pull request reviewers so that they don't get surprised by more code changes than necessary. This makes your code a little better today than yesterday. Since it is incremental and scoped to the small area, this can become the team's habit in a short amount of time. As opposed to this, I was never able to sell big bang refactoring to the business or project management. Coincidently, I found this supporting post from Steve Smith essentially stating the same thing. So, when it comes to tackling a monolith, it is essential to take an incremental approach instead of a big bang one.

Business Alignment

One other step that can be done in parallel is to really understand the business domain from the perspectives of bounded contexts. We can start isolating the business processes in a way that they don’t leave their boundaries. This is not a trivial task to undertake. Often times, it involves a lot of different people from the org and knowledge of the existing processes. If the app is constructed in vertical slices instead of layers as shown by Jimmy Bogard in this talk, we can consider that our app is ahead of the game.

One reason we want to do this is to understand the misalignment that has happened throughout the organization in terms of teams and functions by horizontal layering. The cross-cutting services for everything may seem appealing at first because of the code reuse, upfront time savings and the costs but it insidiously increases the tight coupling. That happens because the reason for change is not considered. Over a period of time, the gap increase by a natural progression of business processes and the horizontal structure of supporting teams and applications. This leads to too much of cross communication between services. If they are disjointed services, that could end up making the system incredibly chatty.

Another big purpose behind this is to get the organizational buy-in into this type of project. It is important for business to understand why vertical slicing is necessary, how much does it cost currently to deliver a piece of code, how this can directly improve the delivery and the possibilities it opens up from the end user’s perspective. It can also show them places to improve efficiencies by cutting down redundancies.

After identifying these areas, we want to establish the communication patterns between different pieces and start establishing service layer agreements for those. For example, when a user buys an item on an ecommerce site, do we show them a spinner until it gets shipped or is it ok to send them an email saying we will send you updates as the order goes through the different steps of processing.

Choosing the area

It is very important to have monitoring in place for a variety of reasons. In this case, we can generate a heat map to identify which parts of the app are heavily used over the others. This can help us setting some weights on these areas. Highly used areas and the slowest performing could be a good starting place.

One other thing to consider is how many new features are coming down the pipe and if the system is no longer in active development. If there is too much happening at once, changing architecture may delay the delivery. The real answer here is, it depends on the type of changes and how much time we can buy from the business. If the system is no longer under active development, it could be a candidate for retirement or get isolated enough for near future retirement. This is not a good candidate.

Remote procedure calls

This is another area that comes in the way of scaling the apps. You don't want to create an accidental DDOS attack on someone's service because you spawn hundreds of your container instances and you call them directly in your process. Also, the network shouldn't be taken for granted. This is another good area to start analysis right away. Eliminating the read RPC calls may not be a trivial task but identifying this is an important step. Our app is not going to scale very well if we are hardcoding the calls to other services. Service discovery with Consul can be helpful. We will see the details on Consul in the later posts.

Shared relational databases

One of the problematic and yet very common forms of integration patterns is the shared relational database. It creates coupling that becomes very hard to remove. Once we start introducing foreign keys into it, it becomes even harder to scale. Is it fair to say the degree of normalization is inversely proportional to the ability to scale the database? Even if some magical indexing strategy makes the calls very responsive, its maintenance is cumbersome at best. I'd recommend this talk by Pat Helland. He has also written some papers in this area. Knowing this, we can start identifying some of the most heavily used tables, their relationships and how applications are using them. This sort of analysis can give teams ideas about reducing the dependencies on these tables and setting them up for the complete break up in the future.


Before jumping too much into the container world, you will have to increase the organizational awareness and education around them. This is especially true in the Windows environments since most of the containers come from the Linux world. How do we leverage these? One place to try them out is in your build and test systems as mentioned in this .Net rocks episode.The risk is lower. The faster build and test pipeline can help in delivering the software quicker.

Parting words

All of these can be done, in parallel, with one goal in mind of identifying areas that could come in the way of breaking up a monolith. After identification, the app areas can slowly start making changes for some immediate cheap wins. I hope this helps.

In the previous post, we saw what microservices are. We also looked at how they differ from the big monolithic apps. In the startup/earlier days, you did a file => new project in Visual Studio or your favorite IDE, designed some perfectly normalized SQL database, brought in an ORM…

Read More

What are microservices?

If you pick up a list of talks for any developer conference, you will find at least one talk related to Microservices. As many, I have been fascinated by them for some time now. The obvious question to me was what are microservices. I started reading up on it. I came across Martin Fowler's post. I listened to industry experts such as Clemens Vasters. In a series of books to read on the topic, I started out with the Building Microservices book. Before we proceed further, it is necessary to understand what a service is.

What's a service?

The SOA patterns book says

Service should provide a distinct business function and it should be a coarse-grained piece of logic. One of the characteristics of the service is autonomy, which means the service should be mainly self-sufficient.

I like this definition. The main point to notice here is autonomy. This means a service can be deployed anytime, multiple times a day without any burden of affecting the consumers. It also tries to keep the failures isolated to itself. Loose coupling between these services is another hallmark of service oriented architecture. So, what are microservices then? What's new about them? Are they just SOA done right?

Microservices definition or properties?

There is no industry-wide, fully agreed, one sentence type of definition for microservices. However, there is a set of properties laid out in Building microservices below:

  • They are autonomous.
  • Their boundaries align with business boundaries.
  • They are small.

Now, how small is small enough? Until it doesn't feel too big? To me, they should be small enough to maintain the cohesion. Business boundaries also give us a good idea on their size. If your service spans across multiple business areas, it is too big. If they are too small and have too many boundaries, then you may end up in the death star architecture like this:

To further the point on autonomy, I should be able to independently scale these services. They should also have their own data sources and don't share a monolithic one. I can then call this data independence. One good point I've heard is

You shouldn't make a query/get call to another microservice.

You can hear Scott Bellware and Scott Hanselman discussing the details here

So, microservices are a combination of these properties. Here are few other great talks that made things very clear to me:


I tend to shy away from playing the analogy card too much but if/when I get pushed too hard on one on this topic, I am likely to use this.

That drawing is trying to refer to a gas stove. Let's say we have to make chicken curry, steam some veggies and make Indian roti(flat bread) for dinner. I can cook all of these things together on each burner without really affecting one another. If I mess one up, the other one is not affected. I can take them off the burner at any time. That's autonomy. Each of these gets a steady supply of gas and has some their own ingredients. That's data independence. Those ingredients stay in the pot and don't leave it until I put them on my plate (composite UI). I have my boundaries clearly defined. I can use an extra burner if required for something that I need more of. Now, I am realizing the scalability aspect of the system.

In the real world

Let's apply this to my favorite tennis equipment site.

All the red rectangle areas could be services of their own with their own databases. They can be scaled differently with different deployment policies.

One of the highly scaled backend services apps I worked on had 12 different NServicebus endpoints. They were scaled differently at least from the instances perspective. They were deployed separately. They were sort of fault tolerant. They processed tens of millions of records every day.
A couple of components in the chain of endpoints were doing direct calls to SOAP services with monolithic databases. The stored procedures used were in the zone of 5k-7k lines. No one knew how to optimize them or what rules were in them. The components took longer to process than everything else. If they failed the couldn't process their backlog at all. Was this a microservices system? Not really because they were sharing a monolithic database through RPC call. Breaking that up was not a trivial task. Moreover, the endpoints did not represent business boundaries very well.

Avoid the tech evangelism trap

A good chunk of folks in our industry is trying to promote something. More often than not, it is their product FOO and it is better than everyone else's. It is the exact thing you are going to need for the foreseeable future. How many times do we hear this type of stuff? The container products are pushed very hard these days in the Microservices space. The evangelists/promoters will tell you what's right about those but don't go into space of where it doesn't apply. They tell you enough to push you off the cliff but leaving the flying part to you. Very often people just get hung up on these things. When you ask them which problem these are actually solving, you get crickets back. I am not saying all of these are bad ideas but you don't have to have one to have microservices. For the exhaustive list of what does or what doesn't make a microservice, I'd refer to Jimmy Bogard's post

Where do I begin?

It is usually easier to break up a monolith into microservices than starting out in a green field project. So, how do you divide these monoliths? There is no one silver bullet type of answer. You have to take it on a case by case basis. The good place to start is monitoring your monolith and develop a heat map. Kelsey Hightower says "If you tell me that the app is slow, you got to be able to tell me why." on this episode of hanselminutes. They touch on a number of topics in this area along with horizontal scalability. This is where containers come into the picture.

Another place is to dig deeper into the business domain and start developing some consensus around boundaries there. I’ve seen implementations go completely awry if this was missed.

Parting words

I hope this clarified some of the clouds around the definition of microservices. They are not a solution that you can slap on every problem out there. I wouldn't break up my blog into microservices architecture. We will cover more topics in the area later.

If you pick up a list of talks for any developer conference, you will find at least one talk related to Microservices. As many, I have been fascinated by them for some time now. The obvious question to me was what are microservices. I started reading up on it. I…

Read More