Load Balancing
Distributing incoming traffic across multiple instances to maximise throughput and resilience.
Overview
A load balancer distributes incoming requests across multiple instances of a service to maximise throughput, minimise response time, and ensure availability when individual instances fail. It operates at Layer 4 (transport, TCP/UDP) or Layer 7 (application, HTTP), with L7 allowing routing decisions based on URL, headers, or cookies.
Origin
Load balancing hardware appeared in the 1990s (Cisco LocalDirector, 1996; F5 BIG-IP, 1997). Software load balancers (HAProxy, 2000; nginx as reverse proxy, 2004) made it accessible. Cloud providers (AWS ELB, 2009) abstracted it further. Modern service meshes (Envoy, Istio) handle it at the sidecar level.
Examples
Load balancing strategies and their trade-offs
# Demonstrating selection strategies (not actual HAProxy config)
instances = ['10.0.0.1:3000', '10.0.0.2:3000', '10.0.0.3:3000']
# Round Robin: predictable, ignores instance load
def round_robin(instances)
instances.rotate!.first
end
# Least Connections: routes to least-loaded instance
def least_connections(instances, connection_counts)
instances.min_by { |i| connection_counts[i] }
end
# IP Hash: same client IP always hits the same instance (sticky sessions)
def ip_hash(instances, client_ip)
index = Digest::MD5.hexdigest(client_ip).to_i(16) % instances.length
instances[index]
end
# Weighted Round Robin: more powerful instances get more traffic
weights = { '10.0.0.1:3000' => 3, '10.0.0.2:3000' => 1, '10.0.0.3:3000' => 1 }Use Cases
- 01Horizontal scaling: distribute load across many small instances rather than one large server
- 02Zero-downtime deployments: take instances out of the pool during updates (rolling deploys)
- 03Health checking: automatically remove unhealthy instances from the pool
- 04Geographic routing: route users to the nearest data centre
When Not to Use
- //Single-instance applications where the complexity of a load balancer is unwarranted
- //Stateful protocols that require persistent connections to a specific backend, unless sticky sessions are configured
Technical Notes
- Session affinity (sticky sessions) pins a user to one instance based on a cookie or IP hash. This works against horizontal scaling, prefer stateless services and centralised session stores (Redis)
- Health checks must test actual application health, not just TCP connectivity. An instance with a full connection pool or a hung thread pool should be removed from rotation
- L7 load balancers (nginx, ALB) enable path-based and header-based routing, essential for microservices and API gateways
- Connection draining: when removing an instance, wait for in-flight requests to complete rather than dropping them
More in Architecture