Photo by Zane Lee on Unsplash

Should CockroachDB be taught at universities?

Wait, you haven’t heard about CockroachDB yet? You are not to blame, I too came across this gem only recently. Better late than never. So as the name suggests, it is a database.

If you have some previous knowledge of database, you might wonder, “What kind of database is this, SQL or NoSQL?” This is neither of them. This is a special type of database called Distributed SQL. Hang on, let me reveal how this differs from its alternatives, and why should you try it for your next project.

Plain Old Databases

I wouldn’t bore you much with the history of databases, but we require some details for a deeper vision into the origin of CockroachDB.

This tale starts with the publication of “A Relational Model of Data for Large Shared Data Banks”, an academic paper by Edgar F. Codd. This paper opened alternative possibilities with its radical data model. It delivered rich indexes to create any query.

It used table joins a term for reading operations to draw together separate records into one. And it leveraged, which meant a combination of reads and writes across the database.

SQL, the structured query language became the language of data.

Relational databases, architected around the assumption of running on a single machine, lacked something that became essential with the internet. With difficulty in scaling out, an increase in data size from millions to billions made it difficult for a single server to handle such a workload. Requiring a massive investment of time and often the trade-offs and sacrifices of many of the features that brought developers to these databases.

The only path is to move forward from a single database server to a cluster of database nodes working in concert. NoSQLs had arrived on the scene. With features to scale out and to tolerate node failures with minimal disruption, it became a favourite. Scalability became cheap.

NoSQL came with functionalities and compromises. There was typically a big compromise on consistency. Besides, without joins and transactions, or with limited indexes, the database came with shortcomings that the engineers had to work their way around.

Legacy SQL databases have tried to fill the gap in the years since with add-on features to help reduce the pain of scaling out. NoSQL systems have been building out a subset of their missing SQL functionality. But none of these architected from the ground up to deliver what we might call Distributed SQL.

The Future is here

Introducing to you, the hero of our story! CockroachDB is a resilient, consistent distributed SQL. It bases this name on the resilient nature of the cockroach which survived the extinction events that even dinosaurs couldn’t.

We have now set the bar for what is being called a basic distributed SQL database:- scale, consistency resiliency, and SQL with ACID transactions. They bake all of this into CockroachDB and go even further.

  • CockroachDB only uses serializable isolation, the highest level possible (simply means that every transaction behaved as if it had exclusive use of the entire cluster from start to finish).
  • CockroachDB uses the open-source PostgreSQL Wire Protocol, tapping into a mature, existing ecosystem of drivers and ORMs the developers have relied upon for years.
  • CockroachDB sets the bar even higher by adding geo-replication (ability to control where your data resides in a globally distributed cluster) allowing users to peg data to a particular locality
  • CockroachDB is multi-cloud, it doesn’t care which cloud provider you're using.
  • CockroachDB provides administration tools, both CLI and GUI.

With other features like optimisation tools such as our cost-based query optimiser and baked in security, including user authentication, authorisation, audit logging, and encryption of your data both on-disk and over the wire. CockroachDB becomes one solution.

Let me give you a thorough analysis of the internal working of CockroachDB, it will excite you more.

Deep Diver into CockroachDB

For now, consider CockroachDB as a single-node typical SQL database, which means we can connect our app or SQL client to CockroachDB and do normal SQL stuff with it. We can create databases, tables, or perform CRUD operations.

Looking under the hood for deeper understanding into the working,

The node that the app or the client connected to is called the Gateway Node. The machine that the CockroachDB node runs on has characteristics like CPU, Memory, several cores. CockroachDB actually built on top of the Pebble Key-Value store (which it uses for its storage engine).

The layer that the app connects to is the SQL Layer. This creates logical and physical plans, which it sends to the Distribution/Transaction Layer. The Distribution layers map the SQL statement into Key-Value pairs as 64 MB chunks of Data called Ranges and writes it to the disk. Along with data ranges we also have System Ranges, which is essential to the CockroachDB’s functioning.

Now, where is the distributed part?

Let’s add more nodes to our cluster, but how many nodes do we need?

To ensure consistency, CockroachDB uses a consensus protocol that requires a “quorum” on any changes to a range before those changes are committed. Since 3 is the smallest number that can archive quorum while still being resilient to partitions or node failures, CockroachDB requires 3 nodes. The consensus protocol we use is called Raft. This is a great resource to learn more about the Raft.

CockroachDB distributes your data among your nodes and replicates each range to at least three nodes. The number of failures that can be tolerated is (replication factor-1)/2.

Truly elastic architecture for effortless growth

CockroachDB makes scale so simple, you don’t have to think about it. It automatically distributes data and workload demand. It breaks free from manual sharding and complex workarounds.

Indestructible data for business-critical apps

Downtime isn’t an option, and data loss destroys companies. CockroachDB is architected to handle unpredictability and survive machine, data centre, and region failures.

With replication factor = 3, one node failure can be tolerated. In that case, the ranges are under replicated, but still available. If another node goes down, the quorum is not met, and the ranges become unavailable.

Now you have seen the future and have a good sense of CockroachDB. They have a great learning platform Cochroach University developing your knowledge of CockroachDB. Hope to see your next project with CRDB😊.

Here are some useful links:

Bit more knowledge about Raft Consensus Algorithm

For any questions or suggestions, you can reach out on my Instagram, LinkedIn. I would be more than happy to give a helping hand.

I will well appreciate one of these 👏

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store