What are Read and Write Models, CQRS and Why you should be using them

Software Development Nov 13, 2024

Business requires us to think of business concepts in multiple and different ways. For instance, let's think of 𝕏. It is different to think of a user in the context of showing its personal data, than retrieving its own posts, than a user that is being suggested to follow.

This if not designed properly can lead up to hard coupling between data representation and business logic and if not solved properly, can lead to problems. CQRS and the use of write models and read models is the solution that we suggest and discuss in this post.

CQRS and write models vs read models

The programming principle behind this kind of solution is CQRS, the Command-Query Responsibility Segregation. As flashy as it might sound, the premise of this pattern is pretty simple: we should split the data mutation, or the command part of a system (writes), from the query part (reads).

In traditional architectures (i.e. CRUD architecture), there is no difference between the read and the write model. When we store a user, we persist an entity with the same morphology that it will have when we query the database for it.

Also, if we now add some new fields to this model, all the places where we read or persist this model will be affected. This is okay for simpler kinds of models or operations, but it can get hard to maintain; makes a strong coupling between the actual data from our entities and the way that we want to consume them, since a change in how we want to see the data will affect how we want to store it, and the other way around, and in general, it makes the scalability, reliability, and security worse.

On the other hand, on the CQRS approach we have to clearly differentiate the two kind of models and their responsibilities:

  • A write model has the responsibility of ensuring that the business rules are complied with. It doesn’t care about how the information will be presented or who is going to consume it.
  • A read model has the responsibility of ensuring that the consumed  information will be delivered in the proper way and will satisfy its consumers. It doesn’t care of how the information is being created or updated.
cqrs-pattern-acid-tango
cqrs-pattern-acid-tango

Making write models or domain models, where we ensure that the invariants and the domain logic is being taken care of, and as many read models as we need depending on the specific use cases where they’re required is the underlying idea behind CQRS.

To understand better what a write model is, Let’s think of the following example for the platform 𝕏: business talks us about a concept called “user”. This concept probably would have some business rules such as:

  • A user must have a name, an ‘@’ and a birthday.
  • A user must be older than a certain age.
  • A user can be verified by 𝕏.
  • A user can change its biography or location.

All these kinds of rules can be ensured by a domain model. If we see the code, we should see this business logic by the code that contains a class that represents this concept, and any new rule that is added to the business referring to users should be added here.

Also, here we will see the data that is relevant to represent a user on its whole, things as the username, the “@”, its birthday, and all the constraints relevant to keep the integrity of our model.

Now, let’s try to understand what a read model is. Following the example with the concept “user”, this user can be represented in multiple ways in different contexts. For example, if we are consuming the app, our own profile will contain information about us as users, but if we check the profile of other users, we will see less information from them than from us.

These are two different concepts although both refer to user information, the “my profile” and “another profile” concept. Something that showcases this clearly is the fact that recently on an update of the platform, the “likes” feed of other users has become hidden. Now we can only see our own feed.

Another example, when we see an 𝕏 post we might see the “likes” that it contains and, of course!, there is also information from users such as its @. Would it make sense to load all the information from a user (like its birthday or location) to a “like”?

Probably not. Actually, on 𝕏, we only see the profile picture and the @ of the user in the notifications tab. Although, if we would want to see our followers, we see a list of users, but in this case, the bio of the user, the profile picture, the @ and the name are included.

Example of the view of a user (Elon Musk) that gave a like to a post of a user, where is only shown the name and the profile picture.

As we can see, there are multiple ways of representing the same information from the “user” concept. We are representing the same user, with different information in different contexts. In the first example, we are joining information from concepts that are not necessarily contained in the user entity (such as the “following” number, the “followers” number and the “subscriptions” number) to make a new model that contains information of the user, and probably does not contain all the information from the user (such as security settings, that wouldn’t make sense to be seen from the view of another user).

This mix of information and particular view should receive a specific name that it is neither “user” nor the information from the relationships with other users (i.e. something like “social”?). Maybe, in the 𝕏 domain language we could call this concept “Profile”, and it would be a new read model.

But wait a minute, couldn’t we just have an endpoint that retrieves information from users that already gives us all this information on a “super read model” that checks all cases? For example, a valid response that could suit both of the previous implementations would be:

GET https://twitter.com/elonmusk
{

userName: ‘Elon Musk’,

identifier: ‘elonmusk’,

joinedAt: ‘01-06-2009’,

isFollowed: false,

isSubscribedTo: false,

following: 759,

followers: 197.900.000

subscriptions: 169

image: https://pbs.twimg.com/profile_images/1815749056821346304/jS8I28PL_400x400.jpg

}

Sure, this data would satisfy both use cases presented previously, and for some applications this kind of treatment for business concepts might be enough. But there are some issues that we should consider:

  • Overbloat: If our application keeps scaling, and there are more instances where we are referring to users information in different contexts, our “super read model” will need to grow to satisfy all the use cases at the same time. And having a model with 50 fields on a view that requires 2 doesn’t sound right. Not to mention the cases where we are using lists (which increments this problem by n times) or the raw fact that GET http requests have a length limit (which is big, but surely approachable)
  • Coupling: Since all the views rely on the same source of data (in this case, an endpoint), if this suffers any kind of regression, change of specifications or in general any change that can affect any of the other consumers, it will be a problem that will be spread through all the application, in every single case where the model is being consumed.
  • Worse Security: Since the overall surface of the information in multiple parts of the application is way bigger than needed (We don’t need the birthday of a user to show that it has liked a post), we will be leaking information that is not required, and in the worst case can be sensible, in contexts where it is not strictly necessary.
  • Non parallelization on development: We have a harder time parallelizing the development related to the users, since even if we are working on 2 different use cases related to users, both modify the same data source, and the appearance of conflicts is likelier.
  • Worse Performance: Since we are obtaining data that we do not require, and we can have worst performing queries, all the places where we consume users related information will be affected. Even if we can parallelize the parts of the query that are more consuming, the overall performance will be, at best, as the worst performing query of the batch.

So… how do we solve this? Our suggestion would be to use read models, making a specific view of our data for the use case that requires it, and adding as many as needed for each use case.

It is a paradigm change, instead of creating a super general solution (view) that solves all the cases, which would be a lot more complex and harder to maintain, we would be creating smaller and simpler solutions for each problem we are solving. For the previous example, we could have two endpoints which respond with the proper read model for each case attending to the concept that is being represented.

GET https://twitter.com/elonmusk
{

userName: ‘Elon Musk’,

identifier: ‘elonmusk’,

joinedAt: ‘01-06-2009’,

isFollowed: false,

isSubscribedTo: false,

following: 759,

followers: 197.900.000

subscriptions: 169

image: https://pbs.twimg.com/profile_images/1815749056821346304/jS8I28PL_400x400.jpg

}
// It's an array since a given tweet can have multiple likes
GET https://twitter.com/myUser/status/18358122669999683491/likes
[
	{

	userName: ‘Elon Musk’,
	
	identifier: ‘elonmusk’,
	
	image: https://pbs.twimg.com/profile_images/1815749056821346304/jS8I28PL_400x400.jpg

	}
]

Even though it can seem that it’s more work (since we are now maintaining two endpoints, use cases, and queries) there are multiple advantages from this kind of approach:

  • Leaner: When using this split, both read and write side are benefited, since your write model won’t have to bother about multiple representation concerns (Which can lead to over bloated models), and the read side does not have to worry about constraints from the business model. Since we write the data properly, we only have to care about reading it and representing it.
  • Maintainability: Even if it can seem counterintuitive, this approach helps to the maintainability in the long run. Sure, we now have 2 new endpoints instead of 1, but If we need to change, fix or add a new view, none of the other views will be concerned, since each read model has its own purpose, and it doesn’t affect either the write model (both are unaware of each other). Even if in the future some of these view models disappear, we won’t have to worry about what fields are being used or not, we can just delete the endpoint without the concern of other parts of the application will be affected. All this contributes to respect the single responsibility principle (SRP)!
  • Security: If there is data that we want to hide in some particular views, by defining models that only give the required data, we minimize the surface of our application, minimizing possible leaks. Also, this slims the chance of bugs and/or other issues.
  • Parallelization: If we have a team working on the write side, and another working on the read side (or multiple read sides!), the friction between this parts will be minimized by being able to split these concerns.
  • Performance: Depending on the particular implementation of CQRS and having to only retrieve the necessary information from the database, we will have queries that will be at worse as good as previously, and can be way better performing.

Also, about the harder-to-maintain part that can be seen as an issue, I think that it is important to remark on the idea of the concept that it is being represented.

For example, let’s think of the combination of the username and the @ that we saw in the second image is a concept in the domain language, let’s call it “minified user”. This minified user can be referenced in multiple parts of the application, and even if the design is different, we are always referring to the same concept, so we would be consuming the same read model for all the designs. And the other way around, if we have 2 designs that are the same, but they represent different concepts, they should consume 2 different read models, to avoid all the trouble that we discussed previously.

With CQRS we are not pretending to split every design by an endpoint, but to split and name properly the concepts of the business. This is what we call the importance of the ubiquitous language on DDD, and to see more benefits to this, you can check this article.

Another important consideration is that in our experience, read models should include as little as possible or, even better, no business logic at all, they should be dumb. Their responsibility should be just to present the data in an expected way, with the flexibility of always being able to add a new read model if required. This helps with the advantages previously described, reducing duplications, overbloating, increasing performance… And also gives us the necessary focus for the business rules in the write side. Since we are confident that our write models contain all the logic necessary to satisfy the business rules (preferably with good designs, automated tests…), as tempting as it can be, adding constraints on the read models will be unnecessary, and therefore strictly worse.

Now that we know the theory behind this principle and the different models, we will give a look on different ways of implementing this kind of solution.

Implementing

There are many ways of implementing the split between models

Repositories

Lets think that we already have the read model defined, and we have all the sources from where we build the model in the same database. We could just define a specific repository that makes queries joining the required tables to obtain the data to build the read model. This is the most straightforward solution, since we are already understanding the difference between the models X and Y and the model resulting from combining some of their data Z.

The simplicity of this solution comes with the downside of the the computing time. In general, by joining 2 tables and depending of how it is done it shouldn’t be much of a burden, but the more we join the exponentially greater this burden will be. So it is a plausible solution, but we can find solutions that fit better depending of the requirements.

DBMS View

This is pretty similar to the previous solution. The main difference is that we will consume the view as if it was a dedicated table with the morphology that we would require for our read model, so we won’t require to do joins manually.

Materialized View

A materialized view allows us to create a table that will be consumed with a desired morphology from joining multiple tables, but in this case, the data will actually be stored in the database, so we don’t have to make the computations when we’re querying. This enhances the response time (by a lot). This can seem like a trivial improvement from the previous ones, but in a lot of systems the response speed time of queries can be critical, and in these cases, it makes sense to use this kind of solution.

Although, we have to take in consideration some of the downsides. The first one is that there might be some delay for the data refresh, so if the use case requires us to have instant feedback from changes, it might not be the best idea to implement this. Also, we will have some redundancy in the overall data, since there will be information from the actual tables and the new view that will be the same.

Dedicated Tables (Projections)

As the last alternative, we will discuss projections. This is of all the solutions the one that responds better to a classical CQRS approach. When we say projections we are not referring to projections in an algebraic way or anything related to databases per se. We are referring to the idea of using as many database tables as needed for every view of the model that is required, that may or may not be stored in different databases, and that when we want to persist or modify a certain entity, we do it for all of these tables, respecting in each case how the information should be stored and represented.

For the previous 𝕏 example, we could have a dedicated table for “users”, and another for “profiles”. When we create or modify a new user, it will be persisted in the first table with the required information for a user read model in our application, but we will also persist the information relevant for the profile read model. So we will have two tables that show information related with the same user, but that have nothing to do with each other. The mechanism that asserts this may be the database itself, or a domain event that triggers multiple handlers that are responsible for creating each kind of user in each kind of table.

This approach also makes easier to deal with some parts of the application. For example, if we think of the “followers” of a user like a counter that every time that someone follows a user it increases its number or when it unfollows it it decreases, in the previous implementations we would have an issue, to show the number, since we can not store the list of followers in the user entity (the user in the example represented has 197 million followers), we would have to either:

  • Make a SQL “count” in the table that we guess 𝕏 has named “followers” (where the relation between users is stored), which is an inviable solution, since this big calculation would be done everytime we query for the user
  • Add a field in the user entity that refers to this number, messing our original model, and going back to the overbloat issue.
  • Create a intermediate table that is just storing the amount of followers for the user, which seems an overkill considering how many unique little tables we would need for each similar case.

In the other hand, if we have a table that is dedicated for the particular view, in the “User Follows User Use Case” that might exist, we could just add the new entry to the “followers” table, and then emit a domain event that is responsible to notify our application that an update in the profile entity related with the given user the current amount of followers number should be increased by 1, which is a way faster, and cleaner solution.

As we can see, this kind of solution is very powerful, since we can do a complete decoupling of the databases and the persisting ways, even make multiple teams work in parallel (we would only require a new handler for the emitted event), while keeping the efficiency of the materialized views, and as we saw in the previous example, there are some use cases that will be way easier to solve. The downside of this solution is obviously the greater complexity: new problems that we have to think of such as the delay of synchronization of information, the synchronization of this information itself, or the fact that we have to manually do everything infrastructurally-wise. We will have to  implement event driven architecture components such as event buses, event emitters and event handlers, we will require to deal with consistency mechanism in the system such as event sourcing for eventual consistency...

This chart shows us the tradeoffs of each way of persisting our read models. The idea is taken from the spanish youtube channel CodelyTv from this video.

Your turn!

We hope that you learned something or at least got some new ideas to improve your projects! Any feedback or question is greatly appreciated! Also, share this article if you think it can be helpful to any partner dealing with this kind of problems.

CQRS and the use of read models can be a game changing principle for complex or growing business. As we’ve seen, there are multiple benefits and ways of implementing them. For any consulting in this kind of solutions, you can visit AcidTango Website and contact us.

Manuel Andrés Carrera Galafate

Hi! I am Manu,I am software engineer and currently working at Acid Tango as a Backend developer.