Resource based slot range splitting in a distributed databases
Resource Based Slot Range Splitting in a Distributed Database
For learning purposes, I was reading a few research papers on the topic of databases. I read through papers like GFS, CockroachDB, DynamoDB, Raft, and others.
DynamoDB uses consistent hashing for key distribution. It also uses virtual nodes to reduce the rebalancing problem when a node dies.
I had an idea of using the resource (disk, CPU, RAM, network) of a server to determine how much data or load it should handle. We can't determine the capacity of a server just with its hardware specs like RAM and disk. If the latency is high or the node has frequent network issues, it makes that particular node not very reliable to hold data. Incorporating network details helps us get better resource scores for a node.
Keys are mapped to slots using CRC16, and there will be a fixed number of slots: 16384. Since I used a storage engine called Pebble (similar to RocksDB), I was able to avoid the headache of writing the storage layer. I greatly underestimated the process of writing a database.
Coming back to the topic, I want to know if this resource-based range splitting is already implemented and to find more information on it. I developed an architecture on this topic called IrisDB, and it implements this resource scores mechanism to determine which node should take the slot range to split. This is a learning project, and I gladly accept any feedback I could get.
Comments
No comments yet. Start the discussion.