Architecture

Service Mesh

Infrastructure layer for handling service-to-service communication, observability, and security at scale.

Overview

A service mesh is an infrastructure layer that handles service-to-service communication inside a cluster. A sidecar proxy (typically Envoy) runs alongside each service instance, intercepting all network traffic. The mesh provides mTLS encryption, load balancing, circuit breaking, retries, distributed tracing, and health monitoring, without changes to application code.

Origin

Lyft built Envoy (2016) as their internal proxy. Google, IBM, and Lyft co-created Istio (2017) as a control plane over Envoy. Linkerd (Buoyant, 2016, rewritten as Linkerd2 in 2018) offers a simpler alternative. AWS App Mesh (2019) and Consul Connect are managed alternatives.

Examples

What a service mesh handles without code changes

# Without a service mesh: every service implements its own:
class OrderService
  def fetch_user(id)
    # Manual retry logic
    retries = 3
    begin
      response = http.get("http://user-service/users/#{id}", timeout: 2)
      raise "Error" unless response.success?
      response
    rescue => e
      retries -= 1
      retry if retries > 0
      raise
    end
  end
end

# With a service mesh (Istio/Envoy sidecar):
# The above becomes just:
class OrderService
  def fetch_user(id)
    http.get("http://user-service/users/#{id}")  # sidecar handles retries, mTLS, tracing
  end
end

# Istio VirtualService (YAML) configures retries outside the code:
# retries:
#   attempts: 3
#   perTryTimeout: 2s
#   retryOn: gateway-error,connect-failure,retriable-4xx

Use Cases

  • 01mTLS between services: enforce that only authorised services can communicate, without code changes
  • 02Distributed tracing: every request automatically gets a trace ID propagated through all service calls
  • 03Traffic management: canary deployments, traffic splitting, and fault injection for chaos testing
  • 04Observability: latency, error rate, and throughput per service pair from the mesh, not from the application

When Not to Use

  • //Small numbers of services where the operational complexity of Istio exceeds the benefit
  • //Teams without Kubernetes expertise, service meshes are complex to configure and debug
  • //Performance-sensitive paths where the sidecar adds measurable latency (typically 1-5ms per hop)

Technical Notes

  • The sidecar pattern: each service pod runs an Envoy proxy container. All ingress and egress traffic is transparently routed through it via iptables rules
  • Istio's control plane (Istiod) pushes configuration to Envoy sidecars. Understanding the xDS protocol (Endpoint, Cluster, Route, Listener discovery) is key to debugging mesh configuration
  • Linkerd2 uses a Rust-based micro-proxy that is significantly lighter than Envoy, making it a better choice for latency-sensitive or resource-constrained environments