Database Sharding & Replication
Horizontal partitioning and replication strategies for data that outgrows a single node.
Overview
Database sharding is horizontal partitioning: splitting data across multiple database instances (shards) so each shard holds a subset of the total dataset. Replication creates copies of data across nodes for fault tolerance and read scaling. Together they allow a database to handle data volumes and read/write rates that exceed what a single node can support.
Origin
Sharding emerged as a pattern when web-scale companies outgrew vertical scaling. Google, Amazon, and Facebook all built custom sharding solutions in the mid-2000s. MongoDB popularised built-in sharding (2010). Vitess (YouTube, open-sourced 2012) enables sharding for MySQL at massive scale.
Examples
Sharding key selection and routing
# Choose a shard based on customer_id (range-based sharding)
SHARD_RANGES = [
{ shard: 'db-shard-0', min: 0, max: 9_999_999 },
{ shard: 'db-shard-1', min: 10_000_000, max: 19_999_999 },
{ shard: 'db-shard-2', min: 20_000_000, max: Float::INFINITY },
]
def shard_for(customer_id)
SHARD_RANGES.find { |r| customer_id.between?(r[:min], r[:max]) }[:shard]
end
# Hash-based sharding: distributes more evenly, harder to rebalance
def shard_for_hash(customer_id, shard_count = 4)
"db-shard-#{Digest::MD5.hexdigest(customer_id.to_s).to_i(16) % shard_count}"
end
# Queries must include the shard key so the router can target the right shard
def find_order(customer_id, order_id)
db = connect_to(shard_for(customer_id))
db.query("SELECT * FROM orders WHERE customer_id = ? AND id = ?", customer_id, order_id)
endQueries that do not include the shard key (customer_id) must fan out to all shards and merge results, expensive and slow. Schema design must ensure common queries always include the shard key.
Use Cases
- 01Tables that exceed what a single node can store (billions of rows, terabytes)
- 02Write-heavy workloads that exceed a single node's write throughput
- 03Multi-tenant SaaS: shard by tenant to isolate customer data physically
When Not to Use
- //Before exhausting vertical scaling, larger instances, read replicas, and caching often defer the need to shard by years
- //When cross-shard joins are frequent, sharding makes them expensive or impossible
- //When the operational complexity of managing multiple database instances is not justified by the workload
Technical Notes
- Cross-shard queries: aggregations and joins across shards require scatter-gather (fan out to all shards, merge in application). Avoid designing schemas that require them
- Rebalancing shards is the hardest operational task. Consistent hashing minimises the data movement when adding shards
- Hot shards: if one shard key is disproportionately active (a viral user, a large tenant), that shard becomes a bottleneck. Choose shard keys that distribute load evenly
- Read replicas are not shards: replicas add read capacity on the same data; shards partition the data. Use both together
More in Architecture