Safety

Input Validation & Sanitization

Never trusting input at the boundary, validating shape, type, and intent before processing.

Overview

Input validation is the first line of defence against injection attacks, data corruption, and unexpected application states. Every value crossing a trust boundary (HTTP request, file upload, message queue payload, IPC) must be validated and sanitised. Validation confirms the value matches the expected shape; sanitisation transforms it into a safe form. Always validate server-side regardless of client-side validation.

Origin

Input validation has been fundamental since the earliest network programs. CERT advisories from the 1990s consistently identified missing validation as the root cause of security vulnerabilities. The OWASP Top 10 has included injection (which stems from missing validation) in every edition since 2003. Libraries like Joi (2014), Zod (2020), and dry-validation (Ruby) formalised schema-driven validation.

Examples

Request validation with Zod in TypeScript

import { z } from 'zod';
import { Request, Response } from 'express';

const CreateOrderSchema = z.object({
  customerId: z.string().uuid('customerId must be a UUID'),
  items: z.array(
    z.object({
      productId: z.string().uuid(),
      quantity: z.number().int().min(1).max(1000),
      unitPriceCents: z.number().int().min(0),
    })
  ).min(1, 'Order must have at least one item').max(50, 'Maximum 50 items per order'),
  couponCode: z.string().max(20).regex(/^[A-Z0-9_-]+$/).optional(),
  shippingAddressId: z.string().uuid(),
});

type CreateOrderInput = z.infer<typeof CreateOrderSchema>;

export async function createOrder(req: Request, res: Response): Promise<void> {
  const result = CreateOrderSchema.safeParse(req.body);
  if (!result.success) {
    res.status(422).json({
      error: { code: 'VALIDATION_ERROR', details: result.error.flatten() }
    });
    return;
  }
  const validated: CreateOrderInput = result.data;
  // validated is now fully typed and safe to use
}

safeParse never throws; it returns { success: boolean, data | error }. z.infer derives the TypeScript type from the schema, so the validated object is typed without manual interface duplication. Zod strips unknown fields by default.

Allowlist-based sanitisation in Ruby

require 'sanitize'

ALLOWED_HTML_CONFIG = Sanitize::Config.merge(
  Sanitize::Config::RELAXED,
  elements: %w[p br strong em ul ol li a blockquote code pre],
  attributes: {
    'a' => %w[href title],
    :all => %w[class]
  },
  protocols: {
    'a' => { 'href' => %w[http https mailto] }
  },
  remove_contents: %w[script style iframe]
)

class CommentService
  def sanitize_body(raw_html)
    Sanitize.fragment(raw_html, ALLOWED_HTML_CONFIG)
  end

  def validate_and_save(user:, body:)
    sanitized = sanitize_body(body)
    raise ArgumentError, 'Comment cannot be empty' if sanitized.strip.empty?
    raise ArgumentError, 'Comment too long' if sanitized.length > 5000

    Comment.create!(user: user, body: sanitized)
  end
end

The Sanitize gem (uses Nokogiri) parses HTML and rebuilds it from an allowlist, stripping anything not explicitly permitted. This is safer than blocklist-based approaches (strip script tags) because unknown future HTML elements are blocked by default.

Use Cases

  • 01HTTP request bodies and query parameters validated at the controller layer before any business logic executes
  • 02File upload validation: verify MIME type from magic bytes (not the Content-Type header), check file size limits, and scan for malware via ClamAV or an external service
  • 03Webhook payload validation: verify the HMAC signature (Stripe, GitHub) before processing the payload to prevent spoofed events
  • 04Database write validation: ActiveRecord validations or Ecto changesets as a second layer after controller-level validation

When Not to Use

  • //Do not rely solely on client-side validation (HTML required attribute, JavaScript form validation); it is trivially bypassed with curl or browser dev tools
  • //Do not use a blocklist (reject inputs containing "script") for HTML sanitisation; allowlists are the only reliable approach
  • //Do not validate only the structure and skip semantic validation; a quantity of -100 passes integer validation but is semantically invalid

Technical Notes

  • Zod's .transform() runs after validation and can coerce types safely: z.string().transform(Number) is unsafe (use z.coerce.number() instead, which handles empty strings correctly)
  • JSON Schema (draft 2020-12) is the standard format for describing request shapes in OpenAPI specs; Zod can generate JSON Schema output via zod-to-json-schema for documentation purposes
  • Content-Type verification for file uploads must read magic bytes (the first few bytes of the file) not the MIME type header; an attacker can set Content-Type: image/jpeg on a PHP file
  • GraphQL input validation requires explicit type definitions and resolvers; unlike REST where Zod validates the full body, GraphQL parses fields individually and missing validation on custom scalars is a common oversight