anirudh@Shadow:~$ cat MergeDB.md
MergeDB
A distributed, fault-tolerant key-value store.
Architecture
MergeDB handles state reconciliation across distributed nodes without centralized coordination. By leveraging Conflict-Free Replicated Data Types (CRDTs), it ensures eventual consistency even during network partitions.
Implementation Details
Communication & Stack
Inter-node communication is built entirely on tonic, a high-performance gRPC framework, and backed by the tokio asynchronous runtime for fast, non-blocking I/O. This allows nodes to efficiently multiplex epidemic gossip payloads.
Client-facing interactions are handled via a lightweight axum API server, with system observability provided by structured tracing (WIP).
Gossip Protocol & Fan-out
To prevent infinite updates across the cluster, I implemented a custom gossip protocol based on epidemic dissemination. Epidemic algorithms mimic the spread of a contagious disease. When a node receives new data, it forwards that message to a randomly selected set of processes of limited size, defined as the fan-out. MergeDB currently uses a fixed fan-out of $K=3$ to balance network load with reliable propagation.
Furthermore, the implementation uses strict state-change detection during the CRDT merge phase. Nodes only bump the last_updated timestamp if a received payload mathematically mutates the local state, effectively breaking the infinite feedback loops inherent in naïve gossip protocols:
async fn gossip_changes(&self, changes: tonic::Request<GossipChangesRequest>) -> Result<tonic::Response<GossipChangesResponse>, tonic::Status> { let key = changes.into_inner().key; let remote_crdt = parse_crdt_payload(changes); // PN-Counters, AWSets, LWW-Registers self.store.entry(key.clone()).and_modify(|stored_value| { // 1. Snapshot state before merge let old_state = stored_value.data.clone(); // 2. Apply CRDT Merge (commutative, associative, idempotent) stored_value.data.merge(&mut remote_crdt.clone()); // 3. Strict change detection to halt gossip loops if stored_value.data != old_state { println!("Merged NEW update for {}", key); stored_value.last_updated = SystemTime::now(); } else { println!("Ignored redundant update for {}", key); } }).or_insert_with(|| StoredValue { data: remote_crdt, last_updated: SystemTime::now(), }); Ok(Response::new(GossipChangesResponse { success: true })) }
Performance Benchmarks Continuous benchmarking is critical to ensure the merge() operations don't bottleneck the Tokio executors. MergeDB tracks serialization and reconciliation overhead using Criterion. The latest performance reports can be viewed here.
Future Directions
- Advanced Replication: Maturing the multi-node replication and dynamic peer discovery protocols.
- Expanded Data Types: Implementing richer CRDT types (like Add-Wins Sets and Maps) beyond standard registers and PN-Counters.
- Industry Benchmarks: Direct throughput and latency comparisons against Redis and DynamoDB.
- Ecosystem: Developing native client libraries across multiple languages.
- LSM Trees: Transitioning the storage engine from an in-memory DashMap to a disk-backed Log-Structured Merge (LSM) Tree. This will utilize custom merge operators to handle CRDT reconciliation during background compaction, allowing datasets to grow beyond RAM limits while maintaining durability.
- BFT (Byzantine Fault Tolerance): Upgrading the trust model. Currently, CRDTs assume all nodes are honest. Implementing BFT mechanisms will prevent malicious or compromised nodes from inhibiting the proper funtion of mergeDB.