Safety

Rate Limiting & Brute-Force Defence

Throttling requests to protect endpoints from automated abuse.

Overview

Rate limiting restricts how many requests a client can make in a given time window, protecting APIs from abuse, DDoS amplification, credential stuffing, and brute force attacks. Common algorithms are fixed window, sliding window, token bucket, and leaky bucket. Rate limits should be enforced at the edge (CDN, API gateway) and at the application layer. Redis is the standard backing store for distributed rate limit state.

Origin

Rate limiting predates web APIs; SMTP servers implemented limits in the 1990s to reduce spam. API rate limiting became mainstream with Twitter's 1.0 API (2006) and its 150-requests-per-hour limit. Token bucket and leaky bucket algorithms were described in networking literature in the 1980s. Redis-based implementations (redis-cell, Upstash rate limiting) made distributed limiting practical.

Examples

Redis-backed sliding window rate limiter in TypeScript

import Redis from 'ioredis';
import { Request, Response, NextFunction } from 'express';

const redis = new Redis(process.env.REDIS_URL!);

async function slidingWindowLimit(
  key: string,
  maxRequests: number,
  windowSec: number
): Promise<{ allowed: boolean; remaining: number; resetAt: number }> {
  const now = Date.now();
  const windowStart = now - windowSec * 1000;

  const pipeline = redis.pipeline();
  pipeline.zremrangebyscore(key, 0, windowStart);       // remove old entries
  pipeline.zadd(key, now, now.toString());               // add current request
  pipeline.zcard(key);                                   // count in window
  pipeline.expire(key, windowSec);                       // auto-cleanup
  const results = await pipeline.exec();
  const count = (results?.[2]?.[1] as number) ?? 0;

  return {
    allowed: count <= maxRequests,
    remaining: Math.max(0, maxRequests - count),
    resetAt: now + windowSec * 1000,
  };
}

export function rateLimitMiddleware(maxRequests: number, windowSec: number) {
  return async (req: Request, res: Response, next: NextFunction) => {
    const key = 'rl:' + (req.ip ?? 'unknown') + ':' + req.path;
    const result = await slidingWindowLimit(key, maxRequests, windowSec);
    res.setHeader('X-RateLimit-Limit', maxRequests);
    res.setHeader('X-RateLimit-Remaining', result.remaining);
    res.setHeader('X-RateLimit-Reset', Math.ceil(result.resetAt / 1000));
    if (!result.allowed) {
      res.status(429).json({ error: 'Rate limit exceeded. Try again later.' });
      return;
    }
    next();
  };
}

Sorted set (ZSET) with score=timestamp stores each request; ZREMRANGEBYSCORE prunes entries outside the window before counting. All operations run in a single pipeline, reducing round trips. The sliding window is more accurate than fixed windows, which allow burst traffic at window boundaries.

Rate limiting in Rails with rack-attack

# config/initializers/rack_attack.rb
class Rack::Attack
  # Use Redis for distributed counting
  Rack::Attack.cache.store = ActiveSupport::Cache::RedisCacheStore.new(
    url: ENV.fetch('REDIS_URL')
  )

  # Throttle login attempts per IP: 5 per 20 seconds
  throttle('login/ip', limit: 5, period: 20.seconds) do |req|
    req.ip if req.path == '/api/v1/auth/login' && req.post?
  end

  # Throttle login attempts per email: 10 per hour (prevent credential stuffing)
  throttle('login/email', limit: 10, period: 1.hour) do |req|
    if req.path == '/api/v1/auth/login' && req.post?
      req.params['email'].to_s.downcase.strip.presence
    end
  end

  # Allow health checks to bypass throttling
  safelist('health-check') do |req|
    req.path == '/health'
  end

  # Custom response for throttled requests
  self.throttled_responder = lambda do |_env|
    [429, { 'Content-Type' => 'application/json' }, ['{"error":"Rate limit exceeded"}']]
  end
end

rack-attack (v6.7+) operates at the Rack middleware layer, before Rails routing, ensuring rate limits apply even to malformed requests. Per-email throttling prevents credential stuffing even when attackers distribute attempts across many IPs.

Use Cases

01Authentication endpoints where limiting attempts per IP and per email prevents brute force and credential stuffing attacks
02Public API endpoints where per-API-key limits enforce fair use and prevent a single consumer from monopolising resources
03Password reset and OTP endpoints where unlimited attempts would allow enumeration or brute force of codes
04File upload and expensive computation endpoints where rate limits prevent resource exhaustion

When Not to Use

//Do not rate limit internal service-to-service calls at the application layer; use network policy and circuit breakers instead
//Do not use IP-based rate limiting as the sole defence for authenticated endpoints; users behind a corporate NAT share an IP and would be collectively blocked
//Do not return information about remaining attempts for security-sensitive endpoints (login, password reset); this leaks information useful to automated attacks

Technical Notes

Token bucket allows bursting up to the bucket capacity, then sustains a steady rate. Leaky bucket (output queue) enforces a constant outflow rate with no burst. Sliding window log is accurate but memory-intensive; sliding window counter approximates it with less memory
Retry-After header (RFC 7231) should accompany 429 responses indicating how many seconds to wait. Many HTTP clients and API frameworks automatically honour it for retry logic
Cloudflare Rate Limiting (managed, layer 7) and AWS API Gateway throttling operate at the edge before traffic reaches the origin; application-level limiting is a backstop for traffic that bypasses the CDN
Redis Lua scripts (used by the redis-cell module implementing GCRA) ensure atomic rate limit evaluation without pipeline race conditions; the pipeline approach above has a small race window between ZCARD and the allow decision