Architecture

Database Sharding & Replication

Horizontal partitioning and replication strategies for data that outgrows a single node.

Overview

Database sharding is horizontal partitioning: splitting data across multiple database instances (shards) so each shard holds a subset of the total dataset. Replication creates copies of data across nodes for fault tolerance and read scaling. Together they allow a database to handle data volumes and read/write rates that exceed what a single node can support.

Origin

Sharding emerged as a pattern when web-scale companies outgrew vertical scaling. Google, Amazon, and Facebook all built custom sharding solutions in the mid-2000s. MongoDB popularised built-in sharding (2010). Vitess (YouTube, open-sourced 2012) enables sharding for MySQL at massive scale.

Examples

Sharding key selection and routing

# Choose a shard based on customer_id (range-based sharding)
SHARD_RANGES = [
  { shard: 'db-shard-0', min: 0,         max: 9_999_999 },
  { shard: 'db-shard-1', min: 10_000_000, max: 19_999_999 },
  { shard: 'db-shard-2', min: 20_000_000, max: Float::INFINITY },
]

def shard_for(customer_id)
  SHARD_RANGES.find { |r| customer_id.between?(r[:min], r[:max]) }[:shard]
end

# Hash-based sharding: distributes more evenly, harder to rebalance
def shard_for_hash(customer_id, shard_count = 4)
  "db-shard-#{Digest::MD5.hexdigest(customer_id.to_s).to_i(16) % shard_count}"
end

# Queries must include the shard key so the router can target the right shard
def find_order(customer_id, order_id)
  db = connect_to(shard_for(customer_id))
  db.query("SELECT * FROM orders WHERE customer_id = ? AND id = ?", customer_id, order_id)
end

Queries that do not include the shard key (customer_id) must fan out to all shards and merge results, expensive and slow. Schema design must ensure common queries always include the shard key.

Use Cases

01Tables that exceed what a single node can store (billions of rows, terabytes)
02Write-heavy workloads that exceed a single node's write throughput
03Multi-tenant SaaS: shard by tenant to isolate customer data physically

When Not to Use

//Before exhausting vertical scaling, larger instances, read replicas, and caching often defer the need to shard by years
//When cross-shard joins are frequent, sharding makes them expensive or impossible
//When the operational complexity of managing multiple database instances is not justified by the workload

Technical Notes

Cross-shard queries: aggregations and joins across shards require scatter-gather (fan out to all shards, merge in application). Avoid designing schemas that require them
Rebalancing shards is the hardest operational task. Consistent hashing minimises the data movement when adding shards
Hot shards: if one shard key is disproportionately active (a viral user, a large tenant), that shard becomes a bottleneck. Choose shard keys that distribute load evenly
Read replicas are not shards: replicas add read capacity on the same data; shards partition the data. Use both together