Performance

Profiling & Benchmarking

Measuring before optimising, tools and approaches for finding the actual bottleneck.

Overview

Profiling measures where a program spends its time and allocates memory, enabling targeted optimisation. It is the antidote to guessing: most engineers' intuitions about bottlenecks are wrong. CPU profiling identifies hot functions (where time is spent); memory profiling identifies allocation sources; flame graphs visualise the call stack weighted by time. Profile before optimising; measure after to confirm improvement.

Origin

gprof (GNU profiling tool, 1982) was among the first widely-used profilers. Java's JProfiler and YourKit emerged in the 2000s. V8's CPU profiler and Chrome DevTools Performance panel made JavaScript profiling accessible in 2010. Brendan Gregg developed flame graphs (2011) while at Netflix as a visual format for stack profiling data. Ruby's rbspy (2018) enables sampling profiling without modifying the target process.

Examples

CPU profiling with Node.js and clinic.js flame graph

// Method 1: V8 built-in profiler via --prof flag
// node --prof server.js
// node --prof-process isolate-*.log > processed.txt
// Look for functions consuming the highest percentage of ticks

// Method 2: clinic.js flame (production-safe sampling profiler)
// npx clinic flame -- node server.js
// Then load test: npx autocannon -c 100 -d 30 http://localhost:3000/api/orders

// Method 3: programmatic profiling with v8-profiler-next
import { Session } from 'inspector';
import fs from 'fs';

async function profileBlock<T>(
  label: string,
  fn: () => Promise<T>
): Promise<T> {
  const session = new Session();
  session.connect();

  await new Promise<void>((resolve, reject) =>
    session.post('Profiler.enable', err => (err ? reject(err) : resolve()))
  );
  await new Promise<void>((resolve, reject) =>
    session.post('Profiler.start', err => (err ? reject(err) : resolve()))
  );

  const result = await fn();

  const profile = await new Promise<any>((resolve, reject) =>
    session.post('Profiler.stop', (err, data) => (err ? reject(err) : resolve(data)))
  );

  fs.writeFileSync(label + '.cpuprofile', JSON.stringify(profile.profile));
  session.disconnect();
  return result;
}

The .cpuprofile file can be loaded in Chrome DevTools (Performance > Load Profile) or VS Code JavaScript Debugger to visualise the flame graph. clinic.js flame is less invasive and works on live servers under load.

Ruby profiling with Stackprof and rack-mini-profiler

require 'stackprof'

# Sampling profiler: 1000 samples/second, minimal overhead (~1%)
# Safe to run in production on specific requests
def profile_block(label)
  profile = StackProf.run(mode: :cpu, interval: 1000) do
    yield
  end

  StackProf::Report.new(profile).print_text(false, 20) # top 20 methods
  # Or generate flamegraph: StackProf::Report.new(profile).print_d3_flamegraph
end

# rack-mini-profiler: per-request SQL query count and timing in development
# Gemfile: gem 'rack-mini-profiler', gem 'flamegraph', gem 'memory_profiler'
# Auto-enabled in development; shows timing bar in top-left corner
# ?pp=flamegraph on any URL generates a CPU flamegraph for that request
# ?pp=profile-memory generates allocation flamegraph

# Finding slow database queries: pg_stat_statements extension
# SELECT query, calls, mean_exec_time, total_exec_time
# FROM pg_stat_statements
# ORDER BY mean_exec_time DESC
# LIMIT 20;

StackProf uses wall-clock or CPU sampling without requiring code instrumentation. rack-mini-profiler (production safe with before_action guards) shows per-request timing breakdowns including SQL queries, so you can see both the query count and time for a single HTTP request.

Use Cases

  • 01Identifying the real bottleneck before optimising; frequently the slowest function is surprising and fixing a different function wastes effort
  • 02Regression detection: profiling before and after a change quantifies the performance impact
  • 03Production performance investigations where sampling profilers add <1% overhead and can be attached to live processes
  • 04Memory allocation profiling to identify which code paths create the most garbage, informing object pooling or streaming decisions

When Not to Use

  • //Do not profile in development under unrealistic conditions (no data, no load); profile under production-like load to get representative measurements
  • //Do not optimise code that is not in the hot path; the 80/20 rule applies: a function called once per request is rarely worth optimising even if it is slow
  • //Do not trust benchmark microtests in isolation; V8's JIT compiler optimises hot loops in ways that do not apply in real production code with varied inputs

Technical Notes

  • Sampling profilers (statistical) interrupt the program at regular intervals and record the current stack; they have low overhead and are production-safe. Instrumenting profilers add timing around every function call; they are accurate but add 5-30% overhead
  • Flame graphs (Brendan Gregg, 2011): x-axis shows total time (width = time spent in function + callees); y-axis shows call stack depth. The widest frames at the top of the stack are the hottest. SVG flame graphs are interactive via Brendan Gregg's flamegraph.pl
  • Async profiling in Node.js is complicated by the event loop; async_hooks (Node 8+) allows tracking async context, and clinic.js doctor can identify event loop lag and async profiling gaps
  • Database query profiling: pg_stat_statements (PostgreSQL extension) aggregates query statistics across all connections; EXPLAIN ANALYZE on individual slow queries provides the execution plan. Together they identify both what is slow (pg_stat_statements) and why (EXPLAIN)