Performance

Database Denormalization

Strategically duplicating data to accelerate read queries at the cost of write complexity.

Overview

Denormalisation intentionally introduces redundancy into a relational schema by storing derived or duplicated data to improve read performance. Normalised schemas (3NF) minimise redundancy but require JOIN operations; denormalised schemas duplicate data to serve common queries from a single table. The trade-off is faster reads at the cost of more complex writes, storage, and risk of inconsistency.

Origin

Relational normalisation theory was developed by Edgar Codd at IBM (1970-1974). Third Normal Form (3NF) became the standard for OLTP databases. Denormalisation as a deliberate performance technique emerged in the 1980s as relational databases scaled. Data warehousing (star/snowflake schemas) in the 1990s institutionalised denormalisation for analytics. Cassandra and DynamoDB enforced denormalisation by design in the 2010s.

Examples

Counter cache and pre-computed totals in Rails migration

class AddDenormalisedFields < ActiveRecord::Migration[7.1]
  def change
    # Counter cache: avoid COUNT query on every post display
    # Without: Post.all.map { |p| [p.id, p.comments.count] } = N+1
    # With: Post.select(:id, :comments_count) = single query, no JOIN
    add_column :posts, :comments_count, :integer, default: 0, null: false
    add_column :posts, :likes_count, :integer, default: 0, null: false

    # Backfill existing data
    Post.find_each do |post|
      Post.where(id: post.id).update_all(
        comments_count: post.comments.count,
        likes_count: post.likes.count
      )
    end

    # Pre-computed order total: avoid SUM(line_items.price * qty) on every display
    add_column :orders, :subtotal_cents, :integer, null: false, default: 0
    add_column :orders, :total_cents, :integer, null: false, default: 0

    # Customer order summary: avoid expensive aggregation per customer page load
    add_column :customers, :total_orders_count, :integer, default: 0, null: false
    add_column :customers, :lifetime_value_cents, :integer, default: 0, null: false
  end
end

ActiveRecord counter_cache: true in the belongs_to declaration automatically updates the parent's counter column on association create/destroy. For custom counters, use after_create/after_destroy callbacks or DB triggers for consistency.

Materialised view for complex reporting query in PostgreSQL

// Materialised views: a pre-computed result set stored as a table
// Refreshing is an explicit operation (REFRESH MATERIALIZED VIEW)

const createMaterialisedView = async (pool: import('pg').Pool) => {
  await pool.query(`
    CREATE MATERIALIZED VIEW daily_revenue_summary AS
    SELECT
      DATE_TRUNC('day', o.placed_at) AS day,
      p.category,
      COUNT(o.id) AS order_count,
      SUM(o.total_cents) AS revenue_cents,
      AVG(o.total_cents)::int AS avg_order_cents,
      COUNT(DISTINCT o.customer_id) AS unique_customers
    FROM orders o
    JOIN order_line_items li ON li.order_id = o.id
    JOIN products p ON p.id = li.product_id
    WHERE o.status = 'fulfilled'
    GROUP BY 1, 2
    WITH DATA
  `);

  // Index on the materialised view for fast date-range queries
  await pool.query(
    'CREATE INDEX idx_revenue_summary_day ON daily_revenue_summary (day DESC)'
  );
};

// Refresh daily (in a cron job or pg_cron)
const refreshView = async (pool: import('pg').Pool) => {
  // CONCURRENTLY does not block reads but requires a unique index
  await pool.query('REFRESH MATERIALIZED VIEW CONCURRENTLY daily_revenue_summary');
};

REFRESH MATERIALIZED VIEW CONCURRENTLY (PostgreSQL 9.4+) creates a new version of the view in the background and swaps atomically, preventing read locks during refresh. Without CONCURRENTLY, reads are blocked for the duration of the refresh.

Use Cases

01Dashboard and reporting queries that aggregate millions of rows; materialised views pre-compute the result
02Counter caches on association counts displayed in list views (post.comments_count, product.reviews_count)
03Analytics databases (warehouse/OLAP) where the read:write ratio is very high and normalisation overhead is unjustified
04Search indexes (Elasticsearch, Algolia) that are denormalised copies of database records optimised for full-text and faceted search

When Not to Use

//Do not denormalise OLTP tables with high write volumes where maintaining consistency across duplicated columns adds significant overhead and risk
//Do not denormalise prematurely; normalise first, identify slow queries via profiling, then denormalise only the specific access patterns that are bottlenecks
//Do not denormalise without implementing mechanisms to keep the redundant data consistent; inconsistent denormalised data is worse than a slow normalised query

Technical Notes

PostgreSQL REFRESH MATERIALIZED VIEW WITH NO DATA creates the view structure without populating it; the first REFRESH populates it. CONCURRENTLY requires a unique index on the view
Apache Parquet and Apache ORC are columnar storage formats used in data lakes and warehouses; they are inherently denormalised and compressed, enabling very fast analytical queries via column pruning and predicate pushdown
Event sourcing enables denormalisation without consistency risk: the event log is the source of truth; materialised views (read models) are derived from events and can be rebuilt deterministically. This is the CQRS + event sourcing pattern
Write-time fan-out (used by Twitter's timeline service) is an extreme form of denormalisation: when a tweet is posted, it is written to the follower timelines of all N followers immediately, enabling O(1) timeline reads at the cost of O(N) writes