NoSQL vs SQL schema
NoSQL gives us the flexibility of dynamic schema. Hence, we often call it schemaless. SQL, on the other hand, has a rigid data model with a pre-defined schema. The tooling around SQL is quite mature. In the .NET world, there are hundreds of options to migrate schema, such as Entity Framework, DbUp, fluentmigrator, and the list goes on.
Schemaless or schema pain?
NoSQL databases such as Cosmos DB are fun to work with during development when we do not have any workload in production. The schema is usually exposed as a POCO, and we can simply add or delete a property to update the data model.
However, once we move to production, updating schema can become a pain. The flexibility of dynamic schema starts biting us back hard. And we realize that maybe NoSQL is not as flexible as we thought.
Backward compatibility with NoSQL
When working with RavenDB or Cosmos DB, we often end up with Obsolete properties that we cannot remove since old documents may still have those properties. The problem gets worse as the application matures. We need to deal with additional complexity and null properties throughout the application. Here is an example of how an entity may look like just after a few schema changes.
We need to deal with these obsolete properties throughout the code, and code becomes messy and hard to maintain.
Raven DB gives us the ability to update to schema through events. Unfortunately, we do not have such an option with Cosmos DB SDK. Let us take the above example to see how the Order document could evolve over a period in a real-world application.
Updating Cosmos Schema to the latest Version
Ideally, we would only like to deal with OrderV3 in the entire code base without worrying about the documents stored in the older schema version.
We can achieve this by manipulating the raw JSON before reading it from Cosmos DB. Cosmos DB trigger is one way to accomplish this. However, triggers and stored procedures only support javascript at the time of writing. Moving the business logic outside of .NET or C# may not be the first choice for many teams.
Custom JSON Serializer to Update Cosmos schema
In my previous post, I talked about creating a custom JSON serializer with Cosmos DB SDK. We can use the same approach to update documents with the old schema version before it is loaded through Cosmos DB SDK.
We start by introducing a property called schema version in Cosmos Document. The schema version is an integer property that we increment for each schema change.
Next, we update CosmosJsonDotNetSerializer from the previous post by manipulating raw JSON of older documents to the latest schema version.
As you can see in the above code, we call the UpdateSchemaVersionToCurrent method before deserializing the stream returned from Cosmos. In this method, we update the JSON Object of the document to the latest schema version.
This way, we always deal with the latest Order entity throughout the code, and when we save the entity back to Cosmos, it is persisted with the latest schema version.
We can take it further
We can run the schema migration as a separate process, for example, in an Azure function where we load a document of an older schema version, convert it to the latest version and save it back to the Cosmos.
This process can be equivalent to migration scripts that we run for SQL schema.
Wrapping Up
Dealing with the older schema versions is not so trivial with NoSQL databases. However, I hope this post gives you some insights into how you can achieve this with Cosmos.
Photo by Markus Winkler on Unsplash
Leave a Reply