Architecture

Database Sharding & Replication

Horizontal partitioning and replication strategies for data that outgrows a single node.

Overview

Database sharding is horizontal partitioning: splitting data across multiple database instances (shards) so each shard holds a subset of the total dataset. Replication creates copies of data across nodes for fault tolerance and read scaling. Together they allow a database to handle data volumes and read/write rates that exceed what a single node can support.

Origin

Sharding emerged as a pattern when web-scale companies outgrew vertical scaling. Google, Amazon, and Facebook all built custom sharding solutions in the mid-2000s. MongoDB popularised built-in sharding (2010). Vitess (YouTube, open-sourced 2012) enables sharding for MySQL at massive scale.

Examples

Sharding key selection and routing

# Choose a shard based on customer_id (range-based sharding)
SHARD_RANGES = [
  { shard: 'db-shard-0', min: 0,         max: 9_999_999 },
  { shard: 'db-shard-1', min: 10_000_000, max: 19_999_999 },
  { shard: 'db-shard-2', min: 20_000_000, max: Float::INFINITY },
]

def shard_for(customer_id)
  SHARD_RANGES.find { |r| customer_id.between?(r[:min], r[:max]) }[:shard]
end

# Hash-based sharding: distributes more evenly, harder to rebalance
def shard_for_hash(customer_id, shard_count = 4)
  "db-shard-#{Digest::MD5.hexdigest(customer_id.to_s).to_i(16) % shard_count}"
end

# Queries must include the shard key so the router can target the right shard
def find_order(customer_id, order_id)
  db = connect_to(shard_for(customer_id))
  db.query("SELECT * FROM orders WHERE customer_id = ? AND id = ?", customer_id, order_id)
end

Queries that do not include the shard key (customer_id) must fan out to all shards and merge results, expensive and slow. Schema design must ensure common queries always include the shard key.

Use Cases

  • 01Tables that exceed what a single node can store (billions of rows, terabytes)
  • 02Write-heavy workloads that exceed a single node's write throughput
  • 03Multi-tenant SaaS: shard by tenant to isolate customer data physically

When Not to Use

  • //Before exhausting vertical scaling, larger instances, read replicas, and caching often defer the need to shard by years
  • //When cross-shard joins are frequent, sharding makes them expensive or impossible
  • //When the operational complexity of managing multiple database instances is not justified by the workload

Technical Notes

  • Cross-shard queries: aggregations and joins across shards require scatter-gather (fan out to all shards, merge in application). Avoid designing schemas that require them
  • Rebalancing shards is the hardest operational task. Consistent hashing minimises the data movement when adding shards
  • Hot shards: if one shard key is disproportionately active (a viral user, a large tenant), that shard becomes a bottleneck. Choose shard keys that distribute load evenly
  • Read replicas are not shards: replicas add read capacity on the same data; shards partition the data. Use both together