Service Mesh
Infrastructure layer for handling service-to-service communication, observability, and security at scale.
Overview
A service mesh is an infrastructure layer that handles service-to-service communication inside a cluster. A sidecar proxy (typically Envoy) runs alongside each service instance, intercepting all network traffic. The mesh provides mTLS encryption, load balancing, circuit breaking, retries, distributed tracing, and health monitoring, without changes to application code.
Origin
Lyft built Envoy (2016) as their internal proxy. Google, IBM, and Lyft co-created Istio (2017) as a control plane over Envoy. Linkerd (Buoyant, 2016, rewritten as Linkerd2 in 2018) offers a simpler alternative. AWS App Mesh (2019) and Consul Connect are managed alternatives.
Examples
What a service mesh handles without code changes
# Without a service mesh: every service implements its own:
class OrderService
def fetch_user(id)
# Manual retry logic
retries = 3
begin
response = http.get("http://user-service/users/#{id}", timeout: 2)
raise "Error" unless response.success?
response
rescue => e
retries -= 1
retry if retries > 0
raise
end
end
end
# With a service mesh (Istio/Envoy sidecar):
# The above becomes just:
class OrderService
def fetch_user(id)
http.get("http://user-service/users/#{id}") # sidecar handles retries, mTLS, tracing
end
end
# Istio VirtualService (YAML) configures retries outside the code:
# retries:
# attempts: 3
# perTryTimeout: 2s
# retryOn: gateway-error,connect-failure,retriable-4xxUse Cases
- 01mTLS between services: enforce that only authorised services can communicate, without code changes
- 02Distributed tracing: every request automatically gets a trace ID propagated through all service calls
- 03Traffic management: canary deployments, traffic splitting, and fault injection for chaos testing
- 04Observability: latency, error rate, and throughput per service pair from the mesh, not from the application
When Not to Use
- //Small numbers of services where the operational complexity of Istio exceeds the benefit
- //Teams without Kubernetes expertise, service meshes are complex to configure and debug
- //Performance-sensitive paths where the sidecar adds measurable latency (typically 1-5ms per hop)
Technical Notes
- The sidecar pattern: each service pod runs an Envoy proxy container. All ingress and egress traffic is transparently routed through it via iptables rules
- Istio's control plane (Istiod) pushes configuration to Envoy sidecars. Understanding the xDS protocol (Endpoint, Cluster, Route, Listener discovery) is key to debugging mesh configuration
- Linkerd2 uses a Rust-based micro-proxy that is significantly lighter than Envoy, making it a better choice for latency-sensitive or resource-constrained environments
More in Architecture