UpdateSchema

Updating Cosmos DB document schema

NoSQL vs SQL schema

NoSQL gives us the flexibility of dynamic schema. Hence, we often call it schemaless. SQL, on the other hand, has a rigid data model with a pre-defined schema. The tooling around SQL is quite mature. In the .NET world, there are hundreds of options to migrate schema, such as Entity Framework, DbUpfluentmigrator, and the list goes on.

Schemaless or schema pain?

NoSQL databases such as Cosmos DB are fun to work with during development when we do not have any workload in production. The schema is usually exposed as a POCO, and we can simply add or delete a property to update the data model.

However, once we move to production, updating schema can become a pain. The flexibility of dynamic schema starts biting us back hard. And we realize that maybe NoSQL is not as flexible as we thought.

Backward compatibility with NoSQL

When working with RavenDB or Cosmos DB, we often end up with Obsolete properties that we cannot remove since old documents may still have those properties. The problem gets worse as the application matures. We need to deal with additional complexity and null properties throughout the application. Here is an example of how an entity may look like just after a few schema changes.

public class Order
{
[JsonProperty("id")]
public string Id { get; set; }
[Obsolete("Order name is obsolete, use Name instead")]
public string OrderName { get; set; }
[Obsolete("HasShipped is obsolete, set OrderStatus as Shipped Instead")]
public bool HasShipped { get; set; }
public string Name { get; set; }
public OrderStatus Status { get; set; }
}
public enum OrderStatus
{
Started,
Processed,
Shipped,
Completed
}
view raw Order.cs hosted with ❤ by GitHub

We need to deal with these obsolete properties throughout the code, and code becomes messy and hard to maintain.

Raven DB gives us the ability to update to schema through events. Unfortunately, we do not have such an option with Cosmos DB SDK. Let us take the above example to see how the Order document could evolve over a period in a real-world application.

public class OrderV1
{
[JsonProperty("id")]
public string Id { get; set; }
public string OrderName { get; set; }
public bool HasShipped { get; set; }
}
view raw OrderV1.cs hosted with ❤ by GitHub
public class OrderV2
{
[JsonProperty("id")]
public string Id { get; set; }
// OrderName renamed to Name
public string Name { get; set; }
public bool HasShipped { get; set; }
}
view raw OrderV2.cs hosted with ❤ by GitHub
public class OrderV3
{
[JsonProperty("id")]
public string Id { get; set; }
public string Name { get; set; }
// Introduced OrderStatus instead of HasShipped
public OrderStatus OrderStatus { get; set; }
}
public enum OrderStatus
{
Started,
Processed,
Shipped,
Completed
}
view raw OrderV3.cs hosted with ❤ by GitHub

Updating Cosmos Schema to the latest Version

Ideally, we would only like to deal with OrderV3 in the entire code base without worrying about the documents stored in the older schema version.

We can achieve this by manipulating the raw JSON before reading it from Cosmos DB. Cosmos DB trigger is one way to accomplish this. However, triggers and stored procedures only support javascript at the time of writing. Moving the business logic outside of .NET or C# may not be the first choice for many teams.

Custom JSON Serializer to Update Cosmos schema

In my previous post, I talked about creating a custom JSON serializer with Cosmos DB SDK. We can use the same approach to update documents with the old schema version before it is loaded through Cosmos DB SDK.

We start by introducing a property called schema version in Cosmos Document. The schema version is an integer property that we increment for each schema change.

public class Order
{
[JsonProperty("id")]
public string Id { get; set; }
public string Name { get; set; }
public OrderStatus OrderStatus { get; set; }
public int SchemaVersion {get; set;}
}
public enum OrderStatus
{
Started,
Processed,
Shipped,
Completed
}
view raw Order.cs hosted with ❤ by GitHub

Next, we update CosmosJsonDotNetSerializer from the previous post by manipulating raw JSON of older documents to the latest schema version.

// Code removed for bravity
public sealed class CosmosJsonDotNetSerializer : CosmosSerializer
{
// Code removed for bravity
public override T FromStream<T>(Stream stream)
{
using (stream)
{
if (typeof(Stream).IsAssignableFrom(typeof(T)))
{
return (T)(object)stream;
}
using (var sr = new StreamReader(stream))
{
using (var jsonTextReader = new JsonTextReader(sr))
{
var jsonSerializer = GetSerializer();
return UpdateSchemaVersionToCurrent<T>(jsonSerializer.Deserialize<JObject>(jsonTextReader));
}
}
}
}
private T UpdateSchemaVersionToCurrent<T>(JObject jObject)
{
const int currentSchemaVersion = 3;
var schemaVersion = jObject["SchemaVersion"].Value<int>();
for (var i = schemaVersion; i < currentSchemaVersion; i++)
{
switch (i)
{
case 1:
jObject["Name"] = jObject["OrderName"];
jObject["OrderName"] = null;
break;
case 2:
var hasShipped = jObject["HasShipped"].Value<bool>();
if (hasShipped)
{
jObject["OrderStatus"] = (int)OrderStatus.Shipped;
}
else
{
jObject["OrderStatus"] = (int)OrderStatus.Processed;
}
break;
}
}
jObject["SchemaVersion"] = currentSchemaVersion;
return jObject.ToObject<T>();
}
// Code removed for bravity
}

As you can see in the above code, we call the UpdateSchemaVersionToCurrent method before deserializing the stream returned from Cosmos. In this method, we update the JSON Object of the document to the latest schema version.

This way, we always deal with the latest Order entity throughout the code, and when we save the entity back to Cosmos, it is persisted with the latest schema version.

We can take it further

We can run the schema migration as a separate process, for example, in an Azure function where we load a document of an older schema version, convert it to the latest version and save it back to the Cosmos.

This process can be equivalent to migration scripts that we run for SQL schema.

Wrapping Up

Dealing with the older schema versions is not so trivial with NoSQL databases. However, I hope this post gives you some insights into how you can achieve this with Cosmos.

Photo by Markus Winkler on Unsplash

Comments

3 responses to “Updating Cosmos DB document schema”

  1. Volo Avatar
    Volo

    Great article, thanks!
    We faced the same pain on current project, but we use EF Core Cosmos DB provider (not SDK).
    I can’t find the way we can use CosmosJsonDotNetSerializer in EF, do you think it’s possible?

    Like

    1. Ankit Vijay Avatar

      Thanks, glad you liked the article. I have not used EF with Cosmos. But I’m sure there should be a way.

      Like

  2. […] created CosmosJsonDotNetSerializer inspired from Cosmos DB SDK. CosmosJsonDotNetSerializer exposes the FromStream method that allows us to deal with raw JSON. You […]

    Like

Leave a Reply

A WordPress.com Website.