This article is written by our sponsor nautilus2k.com which is a new startup trying to help other startups building data-driven apps on the Microsoft .NET Core/ Azure tech stack.
This is part 3 of the previous articles:
If you have read the previous two articles, you know that we talk about generating most of your data access layer. We also mentioned that the generated code is built on top of Entity Framework Core, and is integrated with caching which gives you extreme scalability & performance.
In this article, we will explain what this means & why it is a significant advancement over the norm.
Reading changes from databases
Many companies have built services which synchronize their relational database to their cache. They do this to achieve near real-time sync with their cache, so that their applications are able to show real-time activity. As an example, if your vehicle data is stored in SQL Server, and you can efficiently cache the data as it comes in, you can show vehicles moving around in Google Maps without creating massive load on your database.
The problem with this scheme is that:
- It is extremely hard to implement and to keep running 24x7 without issues.
- It is tightly customized to your database, so it can't really be reused.
- Your database still needs to run at very high performance (bare metal or significant CPU capacity), to be able to keep your cache up to date.
Companies have gone from using database triggers to database technologies like change tracking & change data capture to find changes & sync it with the cache in near real-time.
The main issue with all these schemes is that your database needs to be run at very high performance for these changes to get synced in near real-time. That is very expensive to do in the public cloud.
Change Data Capture captures historical change data in parallel CDC tables. This means that if you do 1 million changes in a table, it will store 1 million rows in the CDC table. If you delete and recreate 1 million rows, then it will create 2 million rows in the CDC table. Not only will this take up significant Disk space, it will quickly eat up any/ all Disk I/O you have leading to heavy disk usage, etc, etc.
What we do differently
Our service can synchronize any database to cache or search without custom code. But we still operate under constraints:
- Your sync performance will always be determined by your CPU capacity & database load.
- You need to enable change tracking & snap shot isolation at the database level.
- You need to enable change tracking for all tables.
- Most importantly, you need to add a timestamp column to all tables which need to be synced.
Any table in SQL Server can only have one timestamp column. But this timestamp column is essential because without it, you will not be able to know what version of the row is present at any time:
- In memory
- In the cache
- Somewhere else you synced the row
This timestamp column is essential for being able to determine whether your cache is in sync with your database, and for reconciliation when issues arise (and they will arise, for sure).
The requirements for the database have been developed after working on synchronization systems, and observing their issues and failures over a long period of time. You get the benefit of all our experience when you go with our platform.
Cache Integration within the DAL
Like we mentioned earlier, we don't map between two systems when we synchronize them. When you point a database at our service, we synchronize it to cache, and use our own data structures within the cache to store the data efficiently.
You use our Fusion Core APIs to read and write to this cache. And when you code generate your Data Access Layer, the generated code is a good template on how to read/ write to the cache using our APIs.
You will notice that you have the option to specify whether you are reading from the cache or not (set to true by default), and additionally, for every read operation, we actually verify whether the cache is up to date within the database, before reading from the cache — else, we fall back to reading from the database.
There are algorithms in place which ensure that if the database is being written to on any server where your DAL is hosted, reads wait a certain time for that write to complete, before proceeding to read from the cache. This is important when you scale using stateless servers on the cloud.
Also, whenever we read from the database for any reason, we always update the cache immediately. This ensures that if you save data, and reload it, your changes are immediately present in the cache, and you can still utilize the cache.
In this way, the cache service and the caching framework work together to ensure that whether the developer created a bug in the caching code OR the service had an issue, your cache is always up to date and works in all scenarios.
It is caching which "just works". You don't have to deal with caching issues because we assume human errors as inevitable, and we handle it in our system.
Why this scales
This scales because most reads are done from the cache, which is cheap to scale up, and the database becomes mostly a data storage mechanism only.
All searches are handled by the search system, and the search indexes are updated via the same change tracking mechanism as the cache. Hence, search "just works", and you never have to debug search issues in your code.
Moving the read and search load to the cache & search systems, allow us to "scale" the database in a cost-effective manner.
How this helps simplify Business Logic
When everything is in the cache, it becomes easier and faster to just get the data from the cache and work on it.
To make this process viable, we provide mechanisms to index the cache rows by one or more columns, just like a database.
We also let you order records by one or more columns, just like a database.
This enables you to just pick and choose the data you need, from the cache directly, completely bypassing the need for queries for the vast majority of non-reporting use cases in the systems.
Simplifying business logic code in this manner, lets developers focus on business rules instead of queries and database optimization.
Everything is easier when you can target business logic directly, instead of having to run it through database optimization:
- Complex Security Implementation
- Complex Business Rules
This is a completely new way of writing code, which most developers outside the largest companies like Microsoft & Google would be unfamiliar with.
Making this kind of technology accessible to everyone is our goal.