Your application works perfectly on your laptop. It handles your demo beautifully. Your investor pitch is flawless. Then real users show up, and everything you thought you knew about software engineering turns out to be a fairy tale you told yourself while localhost returned responses in 3ms.

Scaling an application from near-zero to over a million active users is humbling. The bottlenecks are never where you expect them. The failures are never what you planned for. And the solutions that work at 10K users will actively hurt you at 100K.

This is the playbook worth having before the first 3 AM wake-up call. Every code example is realistic for production. Every recommendation comes from painful experience, not from reading somebody else's blog post about "web scale."

The Scaling Journey: What Breaks and When#

Scaling is not a continuous process. It is a series of phase transitions. Your application works fine, works fine, works fine, and then suddenly it does not. Understanding these breakpoints saves you months of reactive firefighting.

0-100 Users: The "Everything Works" Phase#

At this stage, your single server handles everything. PostgreSQL, your application, Redis, maybe even Nginx — all on the same box. Response times are fast because there is no contention. The database returns queries in microseconds because the entire dataset fits in the buffer cache. You feel smart.

The only thing that matters here: do not add complexity you do not need yet. Plenty of teams set up Kubernetes clusters for 50 users. They spend three months on infrastructure and zero months on product. Then they shut down because nobody wanted the product.

typescript

// At 100 users, this is your entire infrastructure
// One server, one database, one process
 
import express from "express";
import { Pool } from "pg";
 
const pool = new Pool({
  host: "localhost", // yes, localhost. the db is on the same box
  max: 10, // 10 connections is plenty
  idleTimeoutMillis: 30000,
});
 
const app = express();
 
app.get("/api/users/:id", async (req, res) => {
  const { rows } = await pool.query("SELECT id, name, email FROM users WHERE id = $1", [req.params.id]);
  res.json(rows[0]);
});
 
app.listen(3000);
// This handles 100 users without breaking a sweat

100-1K Users: The First Real Problems#

This is where you discover that the queries you wrote during prototyping are garbage. Table scans on 50K rows. Missing indexes on foreign keys. N+1 queries in every list endpoint. Your response times go from 50ms to 2 seconds and you have no idea why because you do not have monitoring.

What breaks:

Unindexed queries start showing up
Session storage in memory causes problems during restarts
File uploads saturate your single disk
Your server process crashes and there is no restart mechanism

What to fix:

Add an index on every column you filter by or join on
Move session storage to Redis
Add a process manager (PM2 or systemd)
Set up basic monitoring (at minimum, response times and error rates)

typescript

// The query that kills you at 1K users
// This is what every ORM generates by default
 
// BAD: Full table scan, no pagination
app.get("/api/posts", async (req, res) => {
  const posts = await db.query("SELECT * FROM posts ORDER BY created_at DESC");
  // You are loading every post into memory
  // At 50K posts, this takes 3 seconds and uses 200MB of RAM
  res.json(posts.rows);
});
 
// GOOD: Indexed, paginated, selecting only what you need
app.get("/api/posts", async (req, res) => {
  const cursor = req.query.cursor;
  const limit = Math.min(Number(req.query.limit) || 20, 100);
 
  const { rows } = await db.query(
    `SELECT id, title, excerpt, author_id, created_at
     FROM posts
     WHERE ($1::timestamptz IS NULL OR created_at < $1)
     ORDER BY created_at DESC
     LIMIT $2`,
    [cursor || null, limit + 1],
  );
 
  const hasMore = rows.length > limit;
  const posts = hasMore ? rows.slice(0, -1) : rows;
 
  res.json({
    posts,
    nextCursor: hasMore ? posts[posts.length - 1].created_at : null,
  });
});
 
// And the index that makes it work
// CREATE INDEX idx_posts_created_at ON posts (created_at DESC);

1K-10K Users: You Need a Real Architecture#

Your single server is running hot. CPU is at 80% during peak hours. The database connection pool is constantly maxed out. You are getting occasional timeouts. Users complain that "the site is slow sometimes."

What breaks:

Database connections become the bottleneck
Background tasks (sending emails, processing images) block request handling
A single server means zero redundancy — any deployment means downtime
Memory leaks that were invisible at low traffic now cause crashes every few days

What to fix:

Separate your database onto its own server
Add a reverse proxy (Nginx) in front of your application
Implement background job processing
Add a CDN for static assets
Move to at least two application servers behind a load balancer

typescript

// At this stage, you need proper connection pooling
// This is not optional — it is the difference between working and not
 
import { Pool } from "pg";
 
const pool = new Pool({
  host: process.env.DB_HOST, // separate DB server now
  port: 5432,
  database: process.env.DB_NAME,
  user: process.env.DB_USER,
  password: process.env.DB_PASSWORD,
  max: 20, // per-process limit
  min: 5, // keep connections warm
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 5000, // fail fast if pool exhausted
  statement_timeout: "10000", // kill queries over 10 seconds
  // THIS IS THE ONE EVERYONE FORGETS
  // Without it, a single slow query can starve the pool
  query_timeout: 15000,
});
 
// Monitor your pool — you'll thank me later
setInterval(() => {
  const { totalCount, idleCount, waitingCount } = pool;
  console.log(
    JSON.stringify({
      level: "info",
      msg: "pool_stats",
      total: totalCount,
      idle: idleCount,
      waiting: waitingCount,
      active: totalCount - idleCount,
    }),
  );
  if (waitingCount > 5) {
    console.log(
      JSON.stringify({
        level: "warn",
        msg: "pool_pressure",
        waiting: waitingCount,
      }),
    );
  }
}, 10000);

10K-100K Users: The Architecture Phase Transition#

This is where amateur hour ends. Everything that was "good enough" stops being good enough. Your database needs read replicas. Your caching needs to be strategic, not opportunistic. Your deployments need to be zero-downtime. You need real monitoring, not "I'll check the logs when something breaks."

What breaks:

Database write throughput becomes the bottleneck
Cache invalidation becomes a real problem (not just "set a TTL")
Your monolith is too big for one developer to understand
Deployment fear — every deploy is risky because the blast radius is the entire application
Operational overhead: you spend more time fighting fires than building features

What to fix:

Database read replicas with query routing
Multi-layer caching strategy
Proper CI/CD with automated testing and canary deployments
APM and distributed tracing
Begin extracting high-traffic paths into separate services

typescript

// Database query routing between primary and replicas
// This is the pattern that lets you scale reads independently
 
import { Pool } from "pg";
 
interface DatabaseCluster {
  primary: Pool;
  replicas: Pool[];
  currentReplica: number;
}
 
function createCluster(): DatabaseCluster {
  const primary = new Pool({
    host: process.env.DB_PRIMARY_HOST,
    max: 20,
    min: 5,
    idleTimeoutMillis: 30000,
    connectionTimeoutMillis: 5000,
  });
 
  const replicaHosts = (process.env.DB_REPLICA_HOSTS || "").split(",");
  const replicas = replicaHosts.map(
    (host) =>
      new Pool({
        host,
        max: 30, // replicas handle more connections (read-heavy)
        min: 5,
        idleTimeoutMillis: 30000,
        connectionTimeoutMillis: 5000,
      }),
  );
 
  return { primary, replicas, currentReplica: 0 };
}
 
const cluster = createCluster();
 
// Round-robin replica selection
function getReadPool(): Pool {
  if (cluster.replicas.length === 0) return cluster.primary;
  const pool = cluster.replicas[cluster.currentReplica];
  cluster.currentReplica = (cluster.currentReplica + 1) % cluster.replicas.length;
  return pool;
}
 
// The key insight: route by operation type
export async function query(sql: string, params?: unknown[], options?: { forcePrimary?: boolean }) {
  // Writes always go to primary
  // Reads go to replicas unless you need strong consistency
  const isWrite = /^\s*(INSERT|UPDATE|DELETE|CREATE|ALTER|DROP)/i.test(sql);
  const pool = isWrite || options?.forcePrimary ? cluster.primary : getReadPool();
 
  const start = performance.now();
  try {
    const result = await pool.query(sql, params);
    const duration = performance.now() - start;
 
    if (duration > 1000) {
      console.log(
        JSON.stringify({
          level: "warn",
          msg: "slow_query",
          sql: sql.substring(0, 200),
          duration_ms: Math.round(duration),
          target: isWrite ? "primary" : "replica",
        }),
      );
    }
 
    return result;
  } catch (error) {
    // If a replica fails, fall back to primary
    if (!isWrite && !options?.forcePrimary) {
      console.log(
        JSON.stringify({
          level: "warn",
          msg: "replica_fallback",
          error: (error as Error).message,
        }),
      );
      return cluster.primary.query(sql, params);
    }
    throw error;
  }
}

100K-1M Users: You Are an Infrastructure Company Now#

Congratulations, you are no longer building a product. You are operating a distributed system, and distributed systems have failure modes that will make you question your career choices.

What breaks:

Everything you thought was reliable turns out to have a failure rate that matters at scale
Network partitions are no longer theoretical
Clock skew causes race conditions in distributed locks
Your monitoring generates so much data that monitoring the monitoring becomes a problem
A single bad deploy can cost you hundreds of thousands of dollars in lost revenue

What to fix:

This is the rest of this blog post

Database Scaling: The Foundation of Everything#

To be direct: 90% of scaling problems are database problems in disguise. Your application is stateless (or should be). Your cache is ephemeral. But your database is where truth lives, and truth is expensive to compute, dangerous to duplicate, and devastating to lose.

Connection Pooling: The First Thing That Breaks#

Every database connection consumes memory on the database server — typically 5-10MB per connection for PostgreSQL. With 10 application servers each maintaining a pool of 20 connections, you have 200 connections. PostgreSQL's default max_connections is 100. You do the math.

The solution is an external connection pooler. It sits between your application and your database, multiplexing hundreds of application connections onto a smaller number of database connections.

sql

-- PostgreSQL configuration for a server handling 100K+ users
-- These are the settings that actually matter
 
-- Connection limits
max_connections = 300               -- Don't go higher; use a pooler instead
superuser_reserved_connections = 3
 
-- Memory (for a 16GB RAM server)
shared_buffers = '4GB'              -- 25% of RAM
effective_cache_size = '12GB'       -- 75% of RAM
work_mem = '64MB'                   -- Per-operation sort/hash memory
maintenance_work_mem = '512MB'      -- For VACUUM, CREATE INDEX
 
-- WAL and replication
wal_level = 'replica'
max_wal_senders = 10
max_replication_slots = 10
wal_buffers = '64MB'
checkpoint_completion_target = 0.9
 
-- Query planner
random_page_cost = 1.1             -- Assuming SSD storage
effective_io_concurrency = 200     -- SSD can handle parallel I/O
default_statistics_target = 200    -- Better query plans, slightly slower ANALYZE
 
-- Logging (you need this for debugging)
log_min_duration_statement = 500   -- Log queries over 500ms
log_checkpoints = on
log_connections = off              -- Too noisy at scale
log_disconnections = off
log_lock_waits = on
log_temp_files = 0                 -- Log ALL temp file usage

Here is how to implement application-level connection pooling properly:

typescript

// Production connection pool with health checking and metrics
 
import { Pool, PoolClient, QueryResult } from "pg";
 
interface PoolMetrics {
  totalQueries: number;
  failedQueries: number;
  slowQueries: number;
  poolExhausted: number;
  averageQueryTime: number;
}
 
class DatabasePool {
  private pool: Pool;
  private metrics: PoolMetrics = {
    totalQueries: 0,
    failedQueries: 0,
    slowQueries: 0,
    poolExhausted: 0,
    averageQueryTime: 0,
  };
  private queryTimes: number[] = [];
  private readonly SLOW_QUERY_MS = 1000;
  private readonly MAX_QUERY_TIME_SAMPLES = 1000;
 
  constructor(config: {
    host: string;
    port: number;
    database: string;
    user: string;
    password: string;
    maxConnections: number;
  }) {
    this.pool = new Pool({
      host: config.host,
      port: config.port,
      database: config.database,
      user: config.user,
      password: config.password,
      max: config.maxConnections,
      min: Math.max(2, Math.floor(config.maxConnections / 4)),
      idleTimeoutMillis: 30000,
      connectionTimeoutMillis: 5000,
      allowExitOnIdle: false,
    });
 
    this.pool.on("error", (err) => {
      // Idle client errors — connection dropped by server/network
      console.error(
        JSON.stringify({
          level: "error",
          msg: "pool_idle_error",
          error: err.message,
        }),
      );
    });
 
    this.pool.on("connect", () => {
      // New connection established — useful for debugging pool behavior
    });
  }
 
  async query<T = unknown>(sql: string, params?: unknown[]): Promise<QueryResult<T>> {
    const start = performance.now();
    this.metrics.totalQueries++;
 
    // Check pool pressure before acquiring
    if (this.pool.waitingCount > 10) {
      this.metrics.poolExhausted++;
      console.log(
        JSON.stringify({
          level: "warn",
          msg: "pool_backpressure",
          waiting: this.pool.waitingCount,
          sql: sql.substring(0, 100),
        }),
      );
    }
 
    try {
      const result = await this.pool.query<T>(sql, params);
      const duration = performance.now() - start;
 
      this.recordQueryTime(duration);
 
      if (duration > this.SLOW_QUERY_MS) {
        this.metrics.slowQueries++;
        console.log(
          JSON.stringify({
            level: "warn",
            msg: "slow_query",
            duration_ms: Math.round(duration),
            sql: sql.substring(0, 200),
            rows: result.rowCount,
          }),
        );
      }
 
      return result;
    } catch (error) {
      this.metrics.failedQueries++;
      const duration = performance.now() - start;
      console.error(
        JSON.stringify({
          level: "error",
          msg: "query_error",
          duration_ms: Math.round(duration),
          sql: sql.substring(0, 200),
          error: (error as Error).message,
        }),
      );
      throw error;
    }
  }
 
  // Transaction support with automatic rollback
  async transaction<T>(fn: (client: PoolClient) => Promise<T>): Promise<T> {
    const client = await this.pool.connect();
    try {
      await client.query("BEGIN");
      const result = await fn(client);
      await client.query("COMMIT");
      return result;
    } catch (error) {
      await client.query("ROLLBACK");
      throw error;
    } finally {
      client.release();
    }
  }
 
  getMetrics(): PoolMetrics {
    return {
      ...this.metrics,
      averageQueryTime: this.calculateAvgQueryTime(),
    };
  }
 
  getPoolStats() {
    return {
      total: this.pool.totalCount,
      idle: this.pool.idleCount,
      waiting: this.pool.waitingCount,
      active: this.pool.totalCount - this.pool.idleCount,
    };
  }
 
  private recordQueryTime(duration: number) {
    this.queryTimes.push(duration);
    if (this.queryTimes.length > this.MAX_QUERY_TIME_SAMPLES) {
      this.queryTimes.shift();
    }
  }
 
  private calculateAvgQueryTime(): number {
    if (this.queryTimes.length === 0) return 0;
    const sum = this.queryTimes.reduce((a, b) => a + b, 0);
    return Math.round(sum / this.queryTimes.length);
  }
 
  async healthCheck(): Promise<boolean> {
    try {
      const result = await this.pool.query("SELECT 1 as health");
      return result.rows[0]?.health === 1;
    } catch {
      return false;
    }
  }
 
  async shutdown(): Promise<void> {
    await this.pool.end();
  }
}
 
export const db = new DatabasePool({
  host: process.env.DB_HOST!,
  port: Number(process.env.DB_PORT) || 5432,
  database: process.env.DB_NAME!,
  user: process.env.DB_USER!,
  password: process.env.DB_PASSWORD!,
  maxConnections: Number(process.env.DB_MAX_CONNECTIONS) || 20,
});

The N+1 Problem: Death by a Thousand Queries#

The N+1 problem is the single most common performance issue in web applications, and it is almost always introduced by ORMs that make it too easy to be lazy.

typescript

// THE N+1 PROBLEM: What most APIs actually do
 
// A request for "list posts with authors" generates:
// 1 query: SELECT * FROM posts LIMIT 20
// 20 queries: SELECT * FROM users WHERE id = ?  (one per post)
// Total: 21 queries for 20 posts
 
// BAD: The ORM makes this look innocent
async function getPostsWithAuthors() {
  const posts = await Post.findAll({ limit: 20 });
  // This triggers 20 additional queries behind the scenes
  for (const post of posts) {
    post.author = await User.findByPk(post.authorId);
  }
  return posts;
}
 
// GOOD: One query with a JOIN
async function getPostsWithAuthors(db: DatabasePool) {
  const { rows } = await db.query(`
    SELECT
      p.id,
      p.title,
      p.excerpt,
      p.created_at,
      json_build_object(
        'id', u.id,
        'name', u.name,
        'avatar_url', u.avatar_url
      ) as author
    FROM posts p
    INNER JOIN users u ON u.id = p.author_id
    ORDER BY p.created_at DESC
    LIMIT 20
  `);
  return rows;
}
 
// EVEN BETTER: Batch loading with DataLoader pattern
// This is what Facebook built for GraphQL, and it works for REST too
 
class BatchLoader<K, V> {
  private batch: Map<
    K,
    {
      resolve: (value: V | null) => void;
      reject: (error: Error) => void;
    }[]
  > = new Map();
  private timer: ReturnType<typeof setTimeout> | null = null;
 
  constructor(private loadFn: (keys: K[]) => Promise<Map<K, V>>) {}
 
  async load(key: K): Promise<V | null> {
    return new Promise((resolve, reject) => {
      const existing = this.batch.get(key) || [];
      existing.push({ resolve, reject });
      this.batch.set(key, existing);
 
      if (!this.timer) {
        // Batch all loads within the same tick
        this.timer = setTimeout(() => this.executeBatch(), 0);
      }
    });
  }
 
  private async executeBatch() {
    const batch = this.batch;
    this.batch = new Map();
    this.timer = null;
 
    const keys = Array.from(batch.keys());
    try {
      const results = await this.loadFn(keys);
      for (const [key, callbacks] of batch) {
        const value = results.get(key) ?? null;
        callbacks.forEach((cb) => cb.resolve(value));
      }
    } catch (error) {
      for (const callbacks of batch.values()) {
        callbacks.forEach((cb) => cb.reject(error as Error));
      }
    }
  }
}
 
// Usage
const userLoader = new BatchLoader<string, User>(async (ids) => {
  const { rows } = await db.query(`SELECT * FROM users WHERE id = ANY($1)`, [ids]);
  return new Map(rows.map((u) => [u.id, u]));
});
 
// Now 20 calls to userLoader.load(id) result in 1 query
// SELECT * FROM users WHERE id = ANY('{id1,id2,...,id20}')

Indexing Strategy: Think Like the Query Planner#

Most developers add indexes reactively — a query is slow, they add an index, it gets faster. That works until you have 50 indexes on a table and your write throughput drops to zero because every INSERT has to update all 50 indexes.

sql

-- INDEXING STRATEGY FOR A SOCIAL MEDIA APP AT SCALE
 
-- Rule 1: Index what you filter, join, and order by
-- But ONLY those columns, and in the right order
 
-- Posts table
-- Primary query: "Get recent posts by user"
CREATE INDEX idx_posts_user_created
  ON posts (user_id, created_at DESC);
-- This covers: WHERE user_id = X ORDER BY created_at DESC
-- The column order matters! user_id first because it's the equality filter
 
-- Rule 2: Partial indexes for common filters
-- 90% of queries only want published posts
CREATE INDEX idx_posts_published_recent
  ON posts (created_at DESC)
  WHERE status = 'published';
-- This index is smaller and faster than indexing all posts
 
-- Rule 3: Covering indexes to avoid table lookups
-- If a query can be satisfied entirely from the index, PostgreSQL
-- never touches the heap (the actual table data)
CREATE INDEX idx_posts_feed_covering
  ON posts (user_id, created_at DESC)
  INCLUDE (title, excerpt, like_count);
-- Now "get feed items" never reads the table
 
-- Rule 4: Expression indexes for computed filters
-- You search by lowercase email everywhere
CREATE UNIQUE INDEX idx_users_email_lower
  ON users (LOWER(email));
-- Make sure your queries use LOWER(email) too
 
-- Rule 5: GIN indexes for full-text search and JSONB
CREATE INDEX idx_posts_search
  ON posts USING gin(to_tsvector('english', title || ' ' || body));
-- For JSONB columns
CREATE INDEX idx_user_preferences
  ON users USING gin(preferences jsonb_path_ops);
 
-- ANTI-PATTERNS: Indexes that hurt more than they help
 
-- BAD: Indexing boolean columns with low selectivity
-- CREATE INDEX idx_posts_active ON posts (is_active);
-- If 95% of posts are active, this index is useless
-- PostgreSQL will do a seq scan anyway because the index isn't selective
 
-- BAD: Too many indexes on a write-heavy table
-- Every INSERT/UPDATE/DELETE must update ALL indexes
-- 10 indexes on a table that gets 1000 writes/second = 10,000 index updates/second
 
-- DIAGNOSTIC: Find unused indexes (run periodically)
SELECT
  schemaname,
  relname AS table_name,
  indexrelname AS index_name,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan AS times_used
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%_pkey'
  AND indexrelname NOT LIKE '%_unique'
ORDER BY pg_relation_size(indexrelid) DESC;
 
-- DIAGNOSTIC: Find missing indexes (queries doing seq scans)
SELECT
  relname AS table_name,
  seq_scan,
  seq_tup_read,
  idx_scan,
  CASE WHEN seq_scan > 0
    THEN seq_tup_read / seq_scan
    ELSE 0
  END AS avg_rows_per_scan
FROM pg_stat_user_tables
WHERE seq_scan > 100
ORDER BY seq_tup_read DESC
LIMIT 20;

Table Partitioning: When Your Tables Get Too Big#

When a single table grows past tens of millions of rows, even indexed queries start slowing down. Partitioning splits one logical table into multiple physical tables, and PostgreSQL transparently routes queries to the right partition.

sql

-- Time-based partitioning for an events/analytics table
-- This is the most common and most useful partitioning strategy
 
CREATE TABLE events (
  id          bigserial,
  user_id     uuid NOT NULL,
  event_type  text NOT NULL,
  payload     jsonb,
  created_at  timestamptz NOT NULL DEFAULT now()
) PARTITION BY RANGE (created_at);
 
-- Create monthly partitions
CREATE TABLE events_2026_01 PARTITION OF events
  FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE events_2026_02 PARTITION OF events
  FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
CREATE TABLE events_2026_03 PARTITION OF events
  FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');
 
-- Indexes on partitions (inherited automatically since PG 11)
CREATE INDEX ON events (user_id, created_at DESC);
CREATE INDEX ON events (event_type, created_at DESC);
 
-- Auto-create future partitions with a cron job
-- This is the script you run monthly:
DO $$
DECLARE
  partition_date date;
  partition_name text;
  start_date text;
  end_date text;
BEGIN
  -- Create partitions for the next 3 months
  FOR i IN 0..2 LOOP
    partition_date := date_trunc('month', now() + (i || ' months')::interval);
    partition_name := 'events_' || to_char(partition_date, 'YYYY_MM');
    start_date := to_char(partition_date, 'YYYY-MM-DD');
    end_date := to_char(partition_date + '1 month'::interval, 'YYYY-MM-DD');
 
    IF NOT EXISTS (
      SELECT 1 FROM pg_class WHERE relname = partition_name
    ) THEN
      EXECUTE format(
        'CREATE TABLE %I PARTITION OF events FOR VALUES FROM (%L) TO (%L)',
        partition_name, start_date, end_date
      );
      RAISE NOTICE 'Created partition %', partition_name;
    END IF;
  END LOOP;
END $$;
 
-- Dropping old data is now instant — just drop the partition
-- Instead of: DELETE FROM events WHERE created_at < '2025-01-01'
-- (which locks the table and takes hours)
-- Do this:
DROP TABLE events_2025_01;
-- Instant. No locks. No vacuum needed.

Materialized Views: Precomputing Expensive Queries#

Some queries are inherently expensive. Aggregations across millions of rows, complex joins across multiple tables, analytics dashboards. You cannot make them fast with indexes alone. Materialized views precompute the results and let you query a flat table instead.

sql

-- Dashboard analytics that would take 30 seconds to compute live
CREATE MATERIALIZED VIEW mv_daily_stats AS
SELECT
  date_trunc('day', e.created_at) AS day,
  COUNT(DISTINCT e.user_id) AS unique_users,
  COUNT(*) AS total_events,
  COUNT(*) FILTER (WHERE e.event_type = 'purchase') AS purchases,
  COALESCE(SUM(
    CASE WHEN e.event_type = 'purchase'
    THEN (e.payload->>'amount')::numeric
    ELSE 0 END
  ), 0) AS revenue,
  COUNT(*) FILTER (WHERE e.event_type = 'signup') AS signups,
  ROUND(
    COUNT(*) FILTER (WHERE e.event_type = 'purchase')::numeric /
    NULLIF(COUNT(DISTINCT e.user_id), 0) * 100, 2
  ) AS conversion_rate
FROM events e
WHERE e.created_at >= now() - INTERVAL '90 days'
GROUP BY date_trunc('day', e.created_at)
ORDER BY day DESC
WITH DATA;
 
-- Unique index enables CONCURRENTLY refresh (no locks)
CREATE UNIQUE INDEX ON mv_daily_stats (day);
 
-- Refresh without blocking readers
-- Run this via cron every 15 minutes
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_stats;
 
-- Now your dashboard query goes from 30 seconds to 5ms:
SELECT * FROM mv_daily_stats WHERE day >= now() - INTERVAL '30 days';

typescript

// Automated materialized view refresh with error handling
 
interface MaterializedViewConfig {
  name: string;
  refreshIntervalMs: number;
  concurrent: boolean;
}
 
class MaterializedViewRefresher {
  private timers: Map<string, ReturnType<typeof setInterval>> = new Map();
 
  constructor(
    private db: DatabasePool,
    private views: MaterializedViewConfig[],
  ) {}
 
  start() {
    for (const view of this.views) {
      const timer = setInterval(() => this.refresh(view), view.refreshIntervalMs);
      this.timers.set(view.name, timer);
 
      // Also refresh immediately on start
      this.refresh(view).catch(() => {});
    }
  }
 
  private async refresh(view: MaterializedViewConfig) {
    const start = performance.now();
    const concurrent = view.concurrent ? "CONCURRENTLY" : "";
 
    try {
      await this.db.query(`REFRESH MATERIALIZED VIEW ${concurrent} ${view.name}`);
      const duration = performance.now() - start;
      console.log(
        JSON.stringify({
          level: "info",
          msg: "mv_refreshed",
          view: view.name,
          duration_ms: Math.round(duration),
        }),
      );
    } catch (error) {
      const duration = performance.now() - start;
      console.error(
        JSON.stringify({
          level: "error",
          msg: "mv_refresh_failed",
          view: view.name,
          duration_ms: Math.round(duration),
          error: (error as Error).message,
        }),
      );
    }
  }
 
  stop() {
    for (const timer of this.timers.values()) {
      clearInterval(timer);
    }
    this.timers.clear();
  }
}
 
// Usage
const refresher = new MaterializedViewRefresher(db, [
  { name: "mv_daily_stats", refreshIntervalMs: 15 * 60 * 1000, concurrent: true },
  { name: "mv_user_activity", refreshIntervalMs: 5 * 60 * 1000, concurrent: true },
  { name: "mv_popular_posts", refreshIntervalMs: 60 * 1000, concurrent: true },
]);
 
refresher.start();

Caching Architecture: The Layer Between You and Disaster#

Caching is easy to add and incredibly hard to get right. The gap between "add a Redis cache" and "implement a caching strategy that does not serve stale data, does not stampede your database, and does not use more memory than your dataset" is enormous.

Cache-Aside Pattern: The Workhorse#

Cache-aside is the most common caching pattern. The application checks the cache first; on miss, it loads from the database and populates the cache. Simple, but the devil is in the details.

typescript

import Redis from "ioredis";
 
const redis = new Redis({
  host: process.env.REDIS_HOST,
  port: Number(process.env.REDIS_PORT) || 6379,
  password: process.env.REDIS_PASSWORD,
  maxRetriesPerRequest: 3,
  retryStrategy(times) {
    const delay = Math.min(times * 50, 2000);
    return delay;
  },
  // THIS IS CRITICAL: enable lazy connect so the app starts
  // even if Redis is down
  lazyConnect: true,
  enableReadyCheck: true,
});
 
// The naive approach everyone starts with
async function getUserNaive(id: string): Promise<User | null> {
  const cached = await redis.get(`user:${id}`);
  if (cached) return JSON.parse(cached);
 
  const { rows } = await db.query("SELECT * FROM users WHERE id = $1", [id]);
  if (rows[0]) {
    await redis.set(`user:${id}`, JSON.stringify(rows[0]), "EX", 3600);
  }
  return rows[0] || null;
}
 
// What is wrong with the naive approach:
// 1. No protection against cache stampede
// 2. No circuit breaker when Redis is down
// 3. No serialization error handling
// 4. No cache warming
// 5. Stale data served indefinitely if invalidation fails
 
// The production version:
 
interface CacheOptions {
  ttlSeconds: number;
  staleWhileRevalidateSeconds?: number;
  lockTimeoutMs?: number;
}
 
class CacheManager {
  private redis: Redis;
  private localCache: Map<string, { value: string; expiry: number }> = new Map();
  private readonly LOCAL_TTL_MS = 5000; // L1 cache: 5 seconds
  private redisHealthy = true;
 
  constructor(redis: Redis) {
    this.redis = redis;
 
    // Health monitoring
    this.redis.on("error", () => {
      this.redisHealthy = false;
    });
    this.redis.on("ready", () => {
      this.redisHealthy = true;
    });
 
    // Clean expired local cache entries every 30 seconds
    setInterval(() => this.cleanLocalCache(), 30000);
  }
 
  async get<T>(key: string, loader: () => Promise<T>, options: CacheOptions): Promise<T> {
    // L1: Check local in-memory cache first (no network call)
    const local = this.localCache.get(key);
    if (local && local.expiry > Date.now()) {
      return JSON.parse(local.value) as T;
    }
 
    // L2: Check Redis
    if (this.redisHealthy) {
      try {
        const cached = await this.redis.get(key);
        if (cached !== null) {
          // Populate L1
          this.localCache.set(key, {
            value: cached,
            expiry: Date.now() + this.LOCAL_TTL_MS,
          });
 
          // Check if we should revalidate in background
          if (options.staleWhileRevalidateSeconds) {
            const ttl = await this.redis.ttl(key);
            const staleThreshold = options.ttlSeconds - options.staleWhileRevalidateSeconds;
            if (ttl > 0 && ttl < staleThreshold) {
              // Serve stale, refresh in background
              this.revalidateInBackground(key, loader, options);
            }
          }
 
          return JSON.parse(cached) as T;
        }
      } catch {
        // Redis failed, fall through to loader
      }
    }
 
    // Cache miss — load from source with stampede protection
    return this.loadWithLock(key, loader, options);
  }
 
  private async loadWithLock<T>(key: string, loader: () => Promise<T>, options: CacheOptions): Promise<T> {
    const lockKey = `lock:${key}`;
    const lockTimeout = options.lockTimeoutMs || 5000;
 
    if (this.redisHealthy) {
      // Try to acquire lock (only one process loads from DB)
      const acquired = await this.redis.set(lockKey, "1", "PX", lockTimeout, "NX");
 
      if (!acquired) {
        // Another process is loading — wait and retry from cache
        await new Promise((resolve) => setTimeout(resolve, 100));
        const cached = await this.redis.get(key);
        if (cached) return JSON.parse(cached) as T;
        // Still no cache — load anyway (lock holder might have failed)
      }
    }
 
    try {
      const value = await loader();
      const serialized = JSON.stringify(value);
 
      // Populate both cache layers
      this.localCache.set(key, {
        value: serialized,
        expiry: Date.now() + this.LOCAL_TTL_MS,
      });
 
      if (this.redisHealthy) {
        await this.redis.set(key, serialized, "EX", options.ttlSeconds);
      }
 
      return value;
    } finally {
      if (this.redisHealthy) {
        await this.redis.del(lockKey).catch(() => {});
      }
    }
  }
 
  private async revalidateInBackground<T>(key: string, loader: () => Promise<T>, options: CacheOptions) {
    // Fire and forget — do not await
    this.loadWithLock(key, loader, options).catch((error) => {
      console.error(
        JSON.stringify({
          level: "error",
          msg: "background_revalidation_failed",
          key,
          error: (error as Error).message,
        }),
      );
    });
  }
 
  async invalidate(...keys: string[]) {
    // Invalidate both layers
    for (const key of keys) {
      this.localCache.delete(key);
    }
    if (this.redisHealthy && keys.length > 0) {
      await this.redis.del(...keys);
    }
  }
 
  async invalidatePattern(pattern: string) {
    // Invalidate local cache entries matching pattern
    const regex = new RegExp("^" + pattern.replace(/\*/g, ".*") + "$");
    for (const key of this.localCache.keys()) {
      if (regex.test(key)) this.localCache.delete(key);
    }
 
    // Invalidate Redis entries
    if (this.redisHealthy) {
      let cursor = "0";
      do {
        const [nextCursor, keys] = await this.redis.scan(cursor, "MATCH", pattern, "COUNT", 100);
        cursor = nextCursor;
        if (keys.length > 0) {
          await this.redis.del(...keys);
        }
      } while (cursor !== "0");
    }
  }
 
  private cleanLocalCache() {
    const now = Date.now();
    for (const [key, entry] of this.localCache) {
      if (entry.expiry < now) {
        this.localCache.delete(key);
      }
    }
  }
}
 
// Usage
const cache = new CacheManager(redis);
 
async function getUser(id: string): Promise<User | null> {
  return cache.get(
    `user:${id}`,
    async () => {
      const { rows } = await db.query("SELECT * FROM users WHERE id = $1", [id]);
      return rows[0] || null;
    },
    {
      ttlSeconds: 3600,
      staleWhileRevalidateSeconds: 300,
      lockTimeoutMs: 5000,
    },
  );
}
 
// On user update, invalidate
async function updateUser(id: string, data: Partial<User>) {
  await db.query("UPDATE users SET name = $2, email = $3 WHERE id = $1", [id, data.name, data.email]);
  await cache.invalidate(`user:${id}`);
  // Also invalidate any list caches that might include this user
  await cache.invalidatePattern(`user-list:*`);
}

Multi-Layer Caching Strategy#

Production applications need multiple cache layers. Each layer has different characteristics:

typescript

// The full caching stack, from fastest to slowest:
//
// L0: CDN Edge Cache     — 50ms globally, 0 server load
// L1: In-Memory Cache    — <1ms, per-process, small capacity
// L2: Redis Cache        — 1-5ms, shared across processes, large capacity
// L3: Database           — 5-50ms, persistent, source of truth
//
// Strategy: Popular data should "float up" to faster layers
 
interface MultiLayerCacheConfig {
  key: string;
  l1TtlMs: number; // In-memory TTL
  l2TtlSeconds: number; // Redis TTL
  cdnMaxAge?: number; // CDN cache-control max-age
  cdnStaleWhileRevalidate?: number;
}
 
// HTTP response with appropriate cache headers for each layer
function setCacheHeaders(res: Response, config: { isPublic: boolean; maxAge: number; staleWhileRevalidate?: number }) {
  const directives: string[] = [];
 
  if (config.isPublic) {
    directives.push("public");
  } else {
    directives.push("private");
  }
 
  directives.push(`max-age=${config.maxAge}`);
 
  if (config.staleWhileRevalidate) {
    directives.push(`stale-while-revalidate=${config.staleWhileRevalidate}`);
  }
 
  res.setHeader("Cache-Control", directives.join(", "));
 
  // Vary header prevents cache poisoning
  res.setHeader("Vary", "Accept-Encoding, Accept-Language");
}
 
// Example: Product listing page
// CDN: 60s max-age, 300s stale-while-revalidate
// Redis: 300s TTL
// DB: Fallback source of truth
 
async function getProducts(req: Request, res: Response) {
  const page = Number(req.query.page) || 1;
  const category = req.query.category as string;
  const cacheKey = `products:${category}:page${page}`;
 
  const products = await cache.get(
    cacheKey,
    async () => {
      const { rows } = await db.query(
        `SELECT id, name, price, image_url, rating
         FROM products
         WHERE category = $1 AND active = true
         ORDER BY popularity DESC
         LIMIT 20 OFFSET $2`,
        [category, (page - 1) * 20],
      );
      return rows;
    },
    { ttlSeconds: 300, staleWhileRevalidateSeconds: 60 },
  );
 
  // Tell CDN and browser how to cache this response
  setCacheHeaders(res, {
    isPublic: true, // no user-specific data
    maxAge: 60, // CDN serves cached version for 60s
    staleWhileRevalidate: 300, // serve stale for up to 300s while refreshing
  });
 
  res.json(products);
}
 
// Example: User profile (private, no CDN)
async function getUserProfile(req: Request, res: Response) {
  const userId = req.user.id;
 
  const profile = await cache.get(
    `profile:${userId}`,
    async () => {
      const { rows } = await db.query(
        `SELECT u.*, COUNT(p.id) as post_count
         FROM users u
         LEFT JOIN posts p ON p.user_id = u.id
         WHERE u.id = $1
         GROUP BY u.id`,
        [userId],
      );
      return rows[0];
    },
    { ttlSeconds: 600, staleWhileRevalidateSeconds: 120 },
  );
 
  setCacheHeaders(res, {
    isPublic: false, // user-specific data — no CDN
    maxAge: 0, // browser always revalidates
  });
 
  res.json(profile);
}

Cache Invalidation: The Two Hard Problems#

Phil Karlton was right: cache invalidation is one of the two hard things in computer science. Here are the patterns that actually work in production:

typescript

// Pattern 1: Event-driven invalidation
// When data changes, publish an event, and cache subscribers handle it
 
import { EventEmitter } from "events";
 
class CacheInvalidationBus {
  private emitter = new EventEmitter();
 
  // Register invalidation rules
  constructor(private cache: CacheManager) {
    // When a user changes, invalidate user cache and any derived caches
    this.on("user:updated", async (userId: string) => {
      await cache.invalidate(`user:${userId}`, `profile:${userId}`);
      await cache.invalidatePattern(`user-list:*`);
      await cache.invalidatePattern(`leaderboard:*`);
    });
 
    // When a post changes, invalidate the post and related feeds
    this.on("post:updated", async (postId: string, authorId: string) => {
      await cache.invalidate(`post:${postId}`);
      await cache.invalidatePattern(`feed:*`);
      await cache.invalidatePattern(`posts:author:${authorId}:*`);
    });
 
    // When a comment is added, invalidate the post's comment count
    this.on("comment:created", async (postId: string) => {
      await cache.invalidate(`post:${postId}`, `comments:${postId}`);
    });
  }
 
  private on(event: string, handler: (...args: string[]) => Promise<void>) {
    this.emitter.on(event, (...args: string[]) => {
      handler(...args).catch((err) => {
        console.error(
          JSON.stringify({
            level: "error",
            msg: "cache_invalidation_failed",
            event,
            error: (err as Error).message,
          }),
        );
      });
    });
  }
 
  emit(event: string, ...args: string[]) {
    this.emitter.emit(event, ...args);
  }
}
 
const invalidation = new CacheInvalidationBus(cache);
 
// In your update handlers:
async function updateUserHandler(req: Request, res: Response) {
  const userId = req.params.id;
  await db.query("UPDATE users SET name = $2 WHERE id = $1", [userId, req.body.name]);
  invalidation.emit("user:updated", userId);
  res.json({ success: true });
}
 
// Pattern 2: Versioned cache keys
// Instead of invalidating, bump a version number
// This is simpler and avoids the "did invalidation succeed?" problem
 
class VersionedCache {
  constructor(
    private redis: Redis,
    private cacheManager: CacheManager,
  ) {}
 
  private versionKey(entity: string, id: string): string {
    return `v:${entity}:${id}`;
  }
 
  private cacheKey(entity: string, id: string, version: number): string {
    return `${entity}:${id}:v${version}`;
  }
 
  async get<T>(entity: string, id: string, loader: () => Promise<T>, ttlSeconds: number): Promise<T> {
    // Get current version
    const version = Number(await this.redis.get(this.versionKey(entity, id))) || 1;
    const key = this.cacheKey(entity, id, version);
 
    return this.cacheManager.get(key, loader, { ttlSeconds });
  }
 
  async invalidate(entity: string, id: string): Promise<void> {
    // Just bump the version — old cache entries expire naturally
    await this.redis.incr(this.versionKey(entity, id));
    // No need to find and delete old entries!
  }
}
 
const versionedCache = new VersionedCache(redis, cache);
 
// Usage — invalidation is now just an INCR, never fails
async function getUserVersioned(id: string): Promise<User | null> {
  return versionedCache.get(
    "user",
    id,
    async () => {
      const { rows } = await db.query("SELECT * FROM users WHERE id = $1", [id]);
      return rows[0] || null;
    },
    3600,
  );
}
 
async function updateUserVersioned(id: string, data: Partial<User>) {
  await db.query("UPDATE users SET name = $2 WHERE id = $1", [id, data.name]);
  await versionedCache.invalidate("user", id);
}

Load Balancing and Horizontal Scaling#

The moment one server is not enough, you need to distribute traffic across multiple servers. This sounds simple. It is not.

Nginx as a Load Balancer#

nginx

# /etc/nginx/nginx.conf
# Production load balancer configuration
 
# Worker processes = CPU cores
worker_processes auto;
worker_rlimit_nofile 65535;
 
events {
    worker_connections 65535;
    multi_accept on;
    use epoll;
}
 
http {
    # Basic settings
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    keepalive_requests 1000;
 
    # Buffer sizes
    client_body_buffer_size 16k;
    client_header_buffer_size 1k;
    client_max_body_size 50m;
    large_client_header_buffers 4 8k;
 
    # Gzip compression
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 5;  # 5 is the sweet spot (6+ uses too much CPU)
    gzip_min_length 256;
    gzip_types
        text/plain
        text/css
        text/javascript
        application/json
        application/javascript
        application/xml
        image/svg+xml;
 
    # Rate limiting
    limit_req_zone $binary_remote_addr zone=api:10m rate=30r/s;
    limit_req_zone $binary_remote_addr zone=login:10m rate=5r/m;
 
    # Connection limiting
    limit_conn_zone $binary_remote_addr zone=addr:10m;
 
    # Upstream (your app servers)
    upstream app_servers {
        # Least connections — best for varying response times
        least_conn;
 
        # App servers with health checks
        server 10.0.1.10:3000 max_fails=3 fail_timeout=30s;
        server 10.0.1.11:3000 max_fails=3 fail_timeout=30s;
        server 10.0.1.12:3000 max_fails=3 fail_timeout=30s;
 
        # Keep connections alive to app servers
        keepalive 32;
    }
 
    # WebSocket upstream (if needed)
    upstream ws_servers {
        # IP hash for sticky sessions (WebSocket requires it)
        ip_hash;
 
        server 10.0.1.10:3001;
        server 10.0.1.11:3001;
        server 10.0.1.12:3001;
    }
 
    server {
        listen 80;
        listen 443 ssl http2;
        server_name example.com;
 
        # SSL (use certbot to generate)
        ssl_certificate /etc/ssl/certs/example.com.pem;
        ssl_certificate_key /etc/ssl/private/example.com.key;
        ssl_protocols TLSv1.2 TLSv1.3;
        ssl_prefer_server_ciphers on;
 
        # Security headers
        add_header X-Frame-Options DENY always;
        add_header X-Content-Type-Options nosniff always;
        add_header X-XSS-Protection "1; mode=block" always;
        add_header Strict-Transport-Security "max-age=63072000" always;
 
        # Static assets — served directly by Nginx
        location /_next/static/ {
            alias /var/www/app/.next/static/;
            expires 365d;
            add_header Cache-Control "public, immutable";
        }
 
        location /static/ {
            alias /var/www/app/public/static/;
            expires 30d;
            add_header Cache-Control "public";
        }
 
        # API routes with rate limiting
        location /api/ {
            limit_req zone=api burst=50 nodelay;
            limit_conn addr 30;
 
            proxy_pass http://app_servers;
            proxy_http_version 1.1;
            proxy_set_header Connection "";  # Enable keepalive
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
 
            # Timeouts
            proxy_connect_timeout 5s;
            proxy_send_timeout 30s;
            proxy_read_timeout 30s;
        }
 
        # Stricter rate limit for auth endpoints
        location /api/auth/ {
            limit_req zone=login burst=3 nodelay;
 
            proxy_pass http://app_servers;
            proxy_http_version 1.1;
            proxy_set_header Connection "";
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
        }
 
        # WebSocket
        location /ws/ {
            proxy_pass http://ws_servers;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_read_timeout 3600s;  # 1 hour for WebSocket
            proxy_send_timeout 3600s;
        }
 
        # Health check endpoint (no rate limit, no logging)
        location /api/health {
            access_log off;
            proxy_pass http://app_servers;
            proxy_http_version 1.1;
            proxy_set_header Connection "";
        }
 
        # Everything else
        location / {
            proxy_pass http://app_servers;
            proxy_http_version 1.1;
            proxy_set_header Connection "";
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
        }
    }
}

Graceful Shutdown: The Difference Between "Deploying" and "Breaking Things"#

When you deploy a new version, the old process needs to stop accepting new requests, finish processing in-flight requests, and then exit. If you just kill the process, every in-flight request returns a 502 error. At 1000 requests per second, a 30-second deploy means 30,000 failed requests.

typescript

// Graceful shutdown handler — use this in EVERY production Node.js app
 
import http from "http";
 
interface ShutdownConfig {
  server: http.Server;
  db: DatabasePool;
  redis: Redis;
  gracePeriodMs: number; // How long to wait for in-flight requests
}
 
function setupGracefulShutdown(config: ShutdownConfig) {
  let isShuttingDown = false;
  let activeRequests = 0;
 
  // Track active requests
  config.server.on("request", (_req, res) => {
    if (isShuttingDown) {
      res.writeHead(503, { "Retry-After": "30" });
      res.end("Server is shutting down");
      return;
    }
    activeRequests++;
    res.on("finish", () => {
      activeRequests--;
    });
  });
 
  async function shutdown(signal: string) {
    if (isShuttingDown) return;
    isShuttingDown = true;
 
    console.log(
      JSON.stringify({
        level: "info",
        msg: "shutdown_initiated",
        signal,
        activeRequests,
      }),
    );
 
    // Step 1: Stop accepting new connections
    config.server.close();
 
    // Step 2: Wait for in-flight requests to complete
    const deadline = Date.now() + config.gracePeriodMs;
 
    while (activeRequests > 0 && Date.now() < deadline) {
      await new Promise((resolve) => setTimeout(resolve, 100));
    }
 
    if (activeRequests > 0) {
      console.log(
        JSON.stringify({
          level: "warn",
          msg: "forced_shutdown",
          remainingRequests: activeRequests,
        }),
      );
    }
 
    // Step 3: Close database connections
    try {
      await config.db.shutdown();
    } catch (error) {
      console.error("DB shutdown error:", (error as Error).message);
    }
 
    // Step 4: Close Redis connections
    try {
      config.redis.disconnect();
    } catch (error) {
      console.error("Redis shutdown error:", (error as Error).message);
    }
 
    console.log(
      JSON.stringify({
        level: "info",
        msg: "shutdown_complete",
      }),
    );
 
    process.exit(0);
  }
 
  process.on("SIGTERM", () => shutdown("SIGTERM"));
  process.on("SIGINT", () => shutdown("SIGINT"));
 
  // Prevent crashes from taking down the process without cleanup
  process.on("uncaughtException", (error) => {
    console.error(
      JSON.stringify({
        level: "fatal",
        msg: "uncaught_exception",
        error: error.message,
        stack: error.stack,
      }),
    );
    shutdown("uncaughtException");
  });
 
  process.on("unhandledRejection", (reason) => {
    console.error(
      JSON.stringify({
        level: "fatal",
        msg: "unhandled_rejection",
        reason: String(reason),
      }),
    );
    // Don't shutdown on unhandled rejections — log and continue
    // But DO fix them. Every unhandled rejection is a bug.
  });
}
 
// Usage
const server = app.listen(3000, () => {
  console.log("Server started on port 3000");
});
 
setupGracefulShutdown({
  server,
  db,
  redis,
  gracePeriodMs: 30000, // 30 seconds to finish in-flight requests
});

Health Checks: Letting the Load Balancer Know You Are Alive#

Health checks seem trivial. They are not. A bad health check is worse than no health check — it either cries wolf (healthy server removed from rotation) or hides problems (unhealthy server keeps receiving traffic).

typescript

// Health check endpoint with dependency checking
 
interface HealthStatus {
  status: "healthy" | "degraded" | "unhealthy";
  checks: Record<
    string,
    {
      status: "pass" | "fail";
      latency_ms: number;
      message?: string;
    }
  >;
  uptime_seconds: number;
  version: string;
}
 
const startTime = Date.now();
 
async function healthCheck(): Promise<HealthStatus> {
  const checks: HealthStatus["checks"] = {};
 
  // Check database
  const dbStart = performance.now();
  try {
    await db.query("SELECT 1");
    checks.database = {
      status: "pass",
      latency_ms: Math.round(performance.now() - dbStart),
    };
  } catch (error) {
    checks.database = {
      status: "fail",
      latency_ms: Math.round(performance.now() - dbStart),
      message: (error as Error).message,
    };
  }
 
  // Check Redis
  const redisStart = performance.now();
  try {
    await redis.ping();
    checks.redis = {
      status: "pass",
      latency_ms: Math.round(performance.now() - redisStart),
    };
  } catch (error) {
    checks.redis = {
      status: "fail",
      latency_ms: Math.round(performance.now() - redisStart),
      message: (error as Error).message,
    };
  }
 
  // Check disk space (for apps that write to disk)
  // Check memory usage
  const memUsage = process.memoryUsage();
  const heapUsedPct = (memUsage.heapUsed / memUsage.heapTotal) * 100;
  checks.memory = {
    status: heapUsedPct < 90 ? "pass" : "fail",
    latency_ms: 0,
    message: `Heap: ${Math.round(heapUsedPct)}% (${Math.round(memUsage.heapUsed / 1024 / 1024)}MB)`,
  };
 
  // Determine overall status
  const allPassing = Object.values(checks).every((c) => c.status === "pass");
  const anyFailing = Object.values(checks).some((c) => c.status === "fail");
 
  // Database down = unhealthy (remove from rotation)
  // Redis down = degraded (keep serving, but warn)
  let status: HealthStatus["status"] = "healthy";
  if (checks.database?.status === "fail" || checks.memory?.status === "fail") {
    status = "unhealthy";
  } else if (!allPassing) {
    status = "degraded";
  }
 
  return {
    status,
    checks,
    uptime_seconds: Math.round((Date.now() - startTime) / 1000),
    version: process.env.APP_VERSION || "unknown",
  };
}
 
// Endpoint
app.get("/api/health", async (_req, res) => {
  const health = await healthCheck();
 
  const statusCode = health.status === "unhealthy" ? 503 : 200;
  res.status(statusCode).json(health);
});
 
// Lightweight liveness probe (for Kubernetes)
// This should ALWAYS return 200 unless the process is completely broken
app.get("/api/health/live", (_req, res) => {
  res.status(200).json({ status: "alive" });
});
 
// Readiness probe (for Kubernetes)
// Returns 503 during startup and shutdown
app.get("/api/health/ready", async (_req, res) => {
  if (isShuttingDown) {
    return res.status(503).json({ status: "shutting_down" });
  }
 
  // Check critical dependencies
  try {
    await db.query("SELECT 1");
    res.status(200).json({ status: "ready" });
  } catch {
    res.status(503).json({ status: "not_ready" });
  }
});

Background Jobs and Message Queues#

Here is a truth that took me too long to learn: the fastest way to make your API fast is to stop doing work in the request handler. Send the response immediately, do the work later.

Sending a welcome email? That takes 2 seconds. Processing an uploaded image? That takes 10 seconds. Generating a PDF report? That takes 30 seconds. None of these need to happen before you send the HTTP response.

Building a Production Job Queue#

typescript

// Production job queue with BullMQ
// This is the pattern I use for every application past 1K users
 
import { Queue, Worker, QueueScheduler, Job } from "bullmq";
import { Redis } from "ioredis";
 
// Shared Redis connection for all queues
const connection = new Redis({
  host: process.env.REDIS_HOST,
  port: Number(process.env.REDIS_PORT) || 6379,
  password: process.env.REDIS_PASSWORD,
  maxRetriesPerRequest: null, // Required by BullMQ
});
 
// Define job types
interface JobTypes {
  "email:welcome": { userId: string; email: string };
  "email:password-reset": { userId: string; token: string };
  "image:resize": { imageId: string; sizes: number[] };
  "report:generate": { userId: string; reportType: string; dateRange: [string, string] };
  "analytics:process": { eventBatch: Array<{ type: string; data: unknown }> };
  "webhook:deliver": { url: string; payload: unknown; attemptNumber: number };
}
 
type JobName = keyof JobTypes;
 
// Queue factory
function createQueue(name: string) {
  return new Queue(name, {
    connection,
    defaultJobOptions: {
      removeOnComplete: {
        age: 24 * 3600, // Keep completed jobs for 24 hours
        count: 1000, // Keep last 1000 completed jobs
      },
      removeOnFail: {
        age: 7 * 24 * 3600, // Keep failed jobs for 7 days
      },
      attempts: 3,
      backoff: {
        type: "exponential",
        delay: 1000, // 1s, 2s, 4s
      },
    },
  });
}
 
// Create queues with different priorities
const emailQueue = createQueue("email");
const imageQueue = createQueue("image-processing");
const reportQueue = createQueue("report-generation");
const analyticsQueue = createQueue("analytics");
const webhookQueue = createQueue("webhook-delivery");
 
// Type-safe job enqueue
async function enqueueJob<T extends JobName>(
  name: T,
  data: JobTypes[T],
  options?: { priority?: number; delay?: number; deduplicationId?: string },
): Promise<string> {
  const queueMap: Record<string, Queue> = {
    "email:welcome": emailQueue,
    "email:password-reset": emailQueue,
    "image:resize": imageQueue,
    "report:generate": reportQueue,
    "analytics:process": analyticsQueue,
    "webhook:deliver": webhookQueue,
  };
 
  const queue = queueMap[name];
  if (!queue) throw new Error(`No queue for job type: ${name}`);
 
  // Deduplication — prevent the same job from being enqueued twice
  const jobId = options?.deduplicationId || undefined;
 
  const job = await queue.add(name, data, {
    priority: options?.priority || 0,
    delay: options?.delay || 0,
    jobId,
  });
 
  console.log(
    JSON.stringify({
      level: "info",
      msg: "job_enqueued",
      queue: queue.name,
      jobName: name,
      jobId: job.id,
    }),
  );
 
  return job.id!;
}
 
// Example: Webhook delivery with exponential backoff and dead letter queue
const webhookWorker = new Worker(
  "webhook-delivery",
  async (job: Job) => {
    const { url, payload, attemptNumber } = job.data;
 
    const controller = new AbortController();
    const timeout = setTimeout(() => controller.abort(), 10000); // 10s timeout
 
    try {
      const response = await fetch(url, {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          "X-Webhook-Signature": computeHmac(JSON.stringify(payload)),
          "X-Webhook-Attempt": String(attemptNumber),
        },
        body: JSON.stringify(payload),
        signal: controller.signal,
      });
 
      if (!response.ok) {
        throw new Error(`Webhook returned ${response.status}: ${response.statusText}`);
      }
 
      console.log(
        JSON.stringify({
          level: "info",
          msg: "webhook_delivered",
          url,
          attempt: attemptNumber,
          status: response.status,
        }),
      );
    } finally {
      clearTimeout(timeout);
    }
  },
  {
    connection,
    concurrency: 10, // Process 10 webhooks in parallel
    limiter: {
      max: 100, // Max 100 jobs per...
      duration: 1000, // ...second (rate limiting)
    },
  },
);
 
// Handle failed jobs — move to dead letter queue after all retries exhausted
webhookWorker.on("failed", async (job, error) => {
  if (job && job.attemptsMade >= (job.opts.attempts || 3)) {
    console.error(
      JSON.stringify({
        level: "error",
        msg: "webhook_dead_letter",
        jobId: job.id,
        url: job.data.url,
        attempts: job.attemptsMade,
        error: error.message,
      }),
    );
    // Move to dead letter queue for manual review
    await enqueueJob(
      "webhook:deliver",
      {
        ...job.data,
        attemptNumber: job.attemptsMade,
      },
      { priority: 10 },
    ); // low priority in DLQ
  }
});
 
function computeHmac(payload: string): string {
  const crypto = require("crypto");
  return crypto.createHmac("sha256", process.env.WEBHOOK_SECRET!).update(payload).digest("hex");
}
 
// Example API handler that returns immediately
app.post("/api/reports", async (req, res) => {
  const { reportType, startDate, endDate } = req.body;
  const userId = req.user.id;
 
  // Enqueue the job — takes < 5ms
  const jobId = await enqueueJob("report:generate", {
    userId,
    reportType,
    dateRange: [startDate, endDate],
  });
 
  // Return immediately — the report generates in the background
  res.status(202).json({
    message: "Report generation started",
    jobId,
    statusUrl: `/api/reports/status/${jobId}`,
  });
});
 
// Status endpoint for long-running jobs
app.get("/api/reports/status/:jobId", async (req, res) => {
  const job = await reportQueue.getJob(req.params.jobId);
  if (!job) {
    return res.status(404).json({ error: "Job not found" });
  }
 
  const state = await job.getState();
  const progress = job.progress;
 
  res.json({
    jobId: job.id,
    state, // 'waiting' | 'active' | 'completed' | 'failed'
    progress, // 0-100
    result: state === "completed" ? job.returnvalue : undefined,
    failedReason: state === "failed" ? job.failedReason : undefined,
  });
});

Idempotency: Because Things Will Be Processed Twice#

In a distributed system, messages will be delivered more than once. Network timeouts, worker crashes, retry logic — all of these can cause a job to be processed multiple times. Your system must handle this gracefully.

typescript

// Idempotent job processing with deduplication
 
class IdempotencyGuard {
  constructor(private redis: Redis) {}
 
  // Check if a job has already been processed
  async isProcessed(idempotencyKey: string): Promise<boolean> {
    const exists = await this.redis.exists(`idempotent:${idempotencyKey}`);
    return exists === 1;
  }
 
  // Mark a job as processing (with TTL to prevent permanent locks)
  async markProcessing(idempotencyKey: string): Promise<boolean> {
    // NX = only set if not exists, EX = expire after 300 seconds
    const result = await this.redis.set(`idempotent:${idempotencyKey}`, "processing", "EX", 300, "NX");
    return result === "OK";
  }
 
  // Mark a job as completed
  async markCompleted(idempotencyKey: string, ttlSeconds = 86400): Promise<void> {
    await this.redis.set(`idempotent:${idempotencyKey}`, "completed", "EX", ttlSeconds);
  }
}
 
const idempotency = new IdempotencyGuard(redis);
 
// Usage in a payment processing worker
async function processPayment(job: Job) {
  const { paymentId, userId, amount } = job.data;
  const idempotencyKey = `payment:${paymentId}`;
 
  // Check if already processed
  if (await idempotency.isProcessed(idempotencyKey)) {
    console.log(
      JSON.stringify({
        level: "info",
        msg: "duplicate_payment_skipped",
        paymentId,
      }),
    );
    return; // Already processed, skip
  }
 
  // Try to acquire processing lock
  const acquired = await idempotency.markProcessing(idempotencyKey);
  if (!acquired) {
    // Another worker is processing this — skip
    return;
  }
 
  try {
    // Process the payment
    await db.transaction(async (client) => {
      // Debit the user's account
      const result = await client.query(
        `UPDATE accounts
         SET balance = balance - $1
         WHERE user_id = $2 AND balance >= $1
         RETURNING balance`,
        [amount, userId],
      );
 
      if (result.rowCount === 0) {
        throw new Error("Insufficient balance");
      }
 
      // Record the transaction
      await client.query(
        `INSERT INTO transactions (payment_id, user_id, amount, status, created_at)
         VALUES ($1, $2, $3, 'completed', now())
         ON CONFLICT (payment_id) DO NOTHING`, // Extra safety: DB-level idempotency
        [paymentId, userId, amount],
      );
    });
 
    await idempotency.markCompleted(idempotencyKey);
 
    console.log(
      JSON.stringify({
        level: "info",
        msg: "payment_processed",
        paymentId,
        amount,
        userId,
      }),
    );
  } catch (error) {
    // Clear the processing lock so it can be retried
    await redis.del(`idempotent:${idempotencyKey}`);
    throw error;
  }
}

CDN and Edge Computing#

A CDN is not just "put Cloudflare in front of your site." At scale, your CDN strategy determines whether your users get a 50ms response or a 500ms response, and whether your origin servers handle 1000 requests per second or 100,000.

Static Asset Optimization Pipeline#

typescript

// Build-time asset optimization pipeline
// This runs during your CI/CD build step
 
import sharp from "sharp";
import { createHash } from "crypto";
import { readFile, writeFile, readdir, mkdir } from "fs/promises";
import { join, extname, basename } from "path";
 
interface OptimizedAsset {
  originalPath: string;
  optimizedPath: string;
  originalSize: number;
  optimizedSize: number;
  hash: string;
  format: string;
}
 
class AssetPipeline {
  private manifest: Map<string, OptimizedAsset> = new Map();
 
  constructor(
    private inputDir: string,
    private outputDir: string,
  ) {}
 
  async processImages(): Promise<void> {
    const files = await readdir(this.inputDir, { recursive: true });
    const imageFiles = files.filter((f) => /\.(png|jpg|jpeg|webp|gif)$/i.test(String(f)));
 
    for (const file of imageFiles) {
      const filePath = join(this.inputDir, String(file));
      await this.optimizeImage(filePath);
    }
 
    // Write manifest for the application to use
    const manifestObj = Object.fromEntries(this.manifest);
    await writeFile(join(this.outputDir, "asset-manifest.json"), JSON.stringify(manifestObj, null, 2));
  }
 
  private async optimizeImage(inputPath: string): Promise<void> {
    const buffer = await readFile(inputPath);
    const hash = createHash("md5").update(buffer).digest("hex").substring(0, 8);
    const name = basename(inputPath, extname(inputPath));
 
    // Generate multiple formats
    const formats = [
      { ext: "webp", processor: sharp(buffer).webp({ quality: 80 }) },
      { ext: "avif", processor: sharp(buffer).avif({ quality: 65 }) },
    ];
 
    // Generate multiple sizes for responsive images
    const sizes = [320, 640, 960, 1280, 1920];
    const metadata = await sharp(buffer).metadata();
    const originalWidth = metadata.width || 1920;
 
    for (const format of formats) {
      for (const width of sizes) {
        if (width > originalWidth) continue;
 
        const outputName = `${name}-${width}w-${hash}.${format.ext}`;
        const outputPath = join(this.outputDir, "images", outputName);
 
        await mkdir(join(this.outputDir, "images"), { recursive: true });
 
        const result = await sharp(buffer)
          .resize(width)
          [format.ext === "webp" ? "webp" : "avif"](format.ext === "webp" ? { quality: 80 } : { quality: 65 })
          .toFile(outputPath);
 
        this.manifest.set(`${name}-${width}.${format.ext}`, {
          originalPath: inputPath,
          optimizedPath: outputPath,
          originalSize: buffer.length,
          optimizedSize: result.size,
          hash,
          format: format.ext,
        });
      }
    }
  }
}
 
// Responsive image component that uses the optimized assets
// This is the server component that generates the HTML
 
function OptimizedImage({
  src,
  alt,
  sizes = "100vw",
  className,
  priority = false,
}: {
  src: string;
  alt: string;
  sizes?: string;
  className?: string;
  priority?: boolean;
}) {
  const baseName = src.replace(/\.[^.]+$/, "");
  const widths = [320, 640, 960, 1280, 1920];
 
  const avifSrcSet = widths.map((w) => `/_next/static/images/${baseName}-${w}w.avif ${w}w`).join(", ");
 
  const webpSrcSet = widths.map((w) => `/_next/static/images/${baseName}-${w}w.webp ${w}w`).join(", ");
 
  return `
    <picture>
      <source type="image/avif" srcSet="${avifSrcSet}" sizes="${sizes}" />
      <source type="image/webp" srcSet="${webpSrcSet}" sizes="${sizes}" />
      <img
        src="/_next/static/images/${baseName}-960w.webp"
        alt="${alt}"
        class="${className || ""}"
        loading="${priority ? "eager" : "lazy"}"
        decoding="async"
        ${priority ? 'fetchpriority="high"' : ""}
      />
    </picture>
  `;
}

Edge Caching with Surrogate Keys#

The real power of a CDN is not caching static files — it is caching dynamic responses. But you need a way to invalidate specific cached responses when the underlying data changes. Surrogate keys solve this.

typescript

// Surrogate key-based cache invalidation
// This works with CDNs that support surrogate keys (most enterprise CDNs)
 
class EdgeCacheManager {
  private surrogateKeys: Set<string> = new Set();
 
  // Tag a response with surrogate keys
  addKey(key: string): void {
    this.surrogateKeys.add(key);
  }
 
  // Get the header value to send with the response
  getHeader(): string {
    return Array.from(this.surrogateKeys).join(" ");
  }
 
  // Purge all responses tagged with a specific key
  static async purge(keys: string[]): Promise<void> {
    // This calls your CDN's purge API
    // Example for a generic CDN API:
    try {
      await fetch(process.env.CDN_PURGE_URL!, {
        method: "POST",
        headers: {
          Authorization: `Bearer ${process.env.CDN_API_TOKEN}`,
          "Content-Type": "application/json",
        },
        body: JSON.stringify({
          surrogate_keys: keys,
        }),
      });
 
      console.log(
        JSON.stringify({
          level: "info",
          msg: "cdn_purge",
          keys,
        }),
      );
    } catch (error) {
      console.error(
        JSON.stringify({
          level: "error",
          msg: "cdn_purge_failed",
          keys,
          error: (error as Error).message,
        }),
      );
    }
  }
}
 
// Middleware that sets cache headers and surrogate keys
function edgeCacheMiddleware(req: Request, res: Response, next: NextFunction) {
  const edgeCache = new EdgeCacheManager();
  (res as any).edgeCache = edgeCache;
 
  // Override res.json to add cache headers
  const originalJson = res.json.bind(res);
  res.json = (body: unknown) => {
    const keys = edgeCache.getHeader();
    if (keys) {
      res.setHeader("Surrogate-Key", keys);
    }
    return originalJson(body);
  };
 
  next();
}
 
// Usage in route handlers
app.get("/api/products/:id", edgeCacheMiddleware, async (req, res) => {
  const product = await getProduct(req.params.id);
 
  // Tag this response so we can purge it later
  (res as any).edgeCache.addKey(`product:${req.params.id}`);
  (res as any).edgeCache.addKey(`products`);
  (res as any).edgeCache.addKey(`category:${product.category}`);
 
  res.setHeader("Cache-Control", "public, max-age=60, s-maxage=3600");
  res.json(product);
});
 
// When a product is updated, purge its cached responses
app.put("/api/products/:id", async (req, res) => {
  await updateProduct(req.params.id, req.body);
 
  // Purge CDN cache for this product and its category
  const product = await getProduct(req.params.id);
  await EdgeCacheManager.purge([
    `product:${req.params.id}`,
    `category:${product.category}`,
    "products", // Also purge any product listing pages
  ]);
 
  res.json({ success: true });
});

Monitoring and Observability at Scale#

You cannot fix what you cannot see. At 100 users, console.log is fine. At 100K users, you need structured logging, distributed tracing, and an alerting strategy that wakes you up for real problems and lets you sleep through noise.

Structured Logging That Actually Helps#

typescript

// Production logging that you can actually query and alert on
 
enum LogLevel {
  DEBUG = "debug",
  INFO = "info",
  WARN = "warn",
  ERROR = "error",
  FATAL = "fatal",
}
 
interface LogEntry {
  timestamp: string;
  level: LogLevel;
  msg: string;
  service: string;
  traceId?: string;
  spanId?: string;
  userId?: string;
  [key: string]: unknown;
}
 
class StructuredLogger {
  private service: string;
  private defaultFields: Record<string, unknown>;
 
  constructor(service: string, defaultFields: Record<string, unknown> = {}) {
    this.service = service;
    this.defaultFields = defaultFields;
  }
 
  private log(level: LogLevel, msg: string, fields: Record<string, unknown> = {}) {
    const entry: LogEntry = {
      timestamp: new Date().toISOString(),
      level,
      msg,
      service: this.service,
      ...this.defaultFields,
      ...fields,
    };
 
    // Mask PII in log entries
    const masked = this.maskPII(entry);
 
    // Write to stdout — let the log aggregator handle routing
    process.stdout.write(JSON.stringify(masked) + "\n");
  }
 
  debug(msg: string, fields?: Record<string, unknown>) {
    this.log(LogLevel.DEBUG, msg, fields);
  }
  info(msg: string, fields?: Record<string, unknown>) {
    this.log(LogLevel.INFO, msg, fields);
  }
  warn(msg: string, fields?: Record<string, unknown>) {
    this.log(LogLevel.WARN, msg, fields);
  }
  error(msg: string, fields?: Record<string, unknown>) {
    this.log(LogLevel.ERROR, msg, fields);
  }
  fatal(msg: string, fields?: Record<string, unknown>) {
    this.log(LogLevel.FATAL, msg, fields);
  }
 
  // Create a child logger with additional default fields
  child(fields: Record<string, unknown>): StructuredLogger {
    const child = new StructuredLogger(this.service, {
      ...this.defaultFields,
      ...fields,
    });
    return child;
  }
 
  private maskPII(entry: LogEntry): LogEntry {
    const masked = { ...entry };
    // Mask email addresses
    if (typeof masked.email === "string") {
      const [name, domain] = masked.email.split("@");
      masked.email = `${name[0]}***@${domain}`;
    }
    // Mask IP addresses (keep first two octets for geo debugging)
    if (typeof masked.ip === "string") {
      const parts = masked.ip.split(".");
      masked.ip = `${parts[0]}.${parts[1]}.x.x`;
    }
    // Remove any field named password, secret, token, etc.
    for (const key of Object.keys(masked)) {
      if (/password|secret|token|authorization|cookie/i.test(key)) {
        masked[key] = "[REDACTED]";
      }
    }
    return masked;
  }
}
 
// Create logger per module
const logger = new StructuredLogger("api-server");
 
// Request logging middleware
function requestLogger(req: Request, res: Response, next: NextFunction) {
  const start = performance.now();
  const traceId = (req.headers["x-trace-id"] as string) || generateTraceId();
 
  // Create a request-scoped logger
  const reqLogger = logger.child({
    traceId,
    method: req.method,
    path: req.path,
    userAgent: req.headers["user-agent"],
    ip: req.ip,
  });
 
  // Attach to request for use in handlers
  (req as any).logger = reqLogger;
  (req as any).traceId = traceId;
 
  // Set trace ID header on response
  res.setHeader("X-Trace-Id", traceId);
 
  // Log when response is sent
  res.on("finish", () => {
    const duration = performance.now() - start;
    const level = res.statusCode >= 500 ? "error" : res.statusCode >= 400 ? "warn" : "info";
 
    reqLogger[level]("request_completed", {
      statusCode: res.statusCode,
      duration_ms: Math.round(duration),
      contentLength: res.getHeader("content-length"),
    });
  });
 
  next();
}
 
function generateTraceId(): string {
  return Array.from(crypto.getRandomValues(new Uint8Array(16)))
    .map((b) => b.toString(16).padStart(2, "0"))
    .join("");
}

The RED Method: What to Measure#

The RED method is the simplest framework for service monitoring that actually works. For every service, measure:

Rate: Requests per second
Errors: Errors per second (and error rate as a percentage)
Duration: Response time distribution (p50, p95, p99)

typescript

// RED metrics collection
 
class RedMetrics {
  private requests: number[] = []; // timestamps of requests
  private errors: number[] = []; // timestamps of errors
  private durations: number[] = []; // response times in ms
  private readonly windowMs = 60000; // 1-minute sliding window
  private readonly maxSamples = 10000;
 
  recordRequest(duration: number, isError: boolean) {
    const now = Date.now();
    this.requests.push(now);
    this.durations.push(duration);
    if (isError) this.errors.push(now);
 
    // Trim old data
    if (this.requests.length > this.maxSamples) {
      this.cleanup();
    }
  }
 
  getMetrics(): {
    rate: number;
    errorRate: number;
    p50: number;
    p95: number;
    p99: number;
  } {
    this.cleanup();
    const windowStart = Date.now() - this.windowMs;
 
    const recentRequests = this.requests.filter((t) => t >= windowStart);
    const recentErrors = this.errors.filter((t) => t >= windowStart);
    const recentDurations = [...this.durations].sort((a, b) => a - b);
 
    const rate = recentRequests.length / (this.windowMs / 1000);
    const errorRate = recentRequests.length > 0 ? (recentErrors.length / recentRequests.length) * 100 : 0;
 
    return {
      rate: Math.round(rate * 100) / 100,
      errorRate: Math.round(errorRate * 100) / 100,
      p50: this.percentile(recentDurations, 50),
      p95: this.percentile(recentDurations, 95),
      p99: this.percentile(recentDurations, 99),
    };
  }
 
  private percentile(sorted: number[], pct: number): number {
    if (sorted.length === 0) return 0;
    const index = Math.ceil((pct / 100) * sorted.length) - 1;
    return Math.round(sorted[Math.max(0, index)]);
  }
 
  private cleanup() {
    const cutoff = Date.now() - this.windowMs * 5; // Keep 5 minutes of data
    this.requests = this.requests.filter((t) => t >= cutoff);
    this.errors = this.errors.filter((t) => t >= cutoff);
    // Only keep recent durations
    if (this.durations.length > this.maxSamples) {
      this.durations = this.durations.slice(-this.maxSamples);
    }
  }
}
 
// Per-route metrics
const routeMetrics = new Map<string, RedMetrics>();
 
function getRouteMetrics(route: string): RedMetrics {
  if (!routeMetrics.has(route)) {
    routeMetrics.set(route, new RedMetrics());
  }
  return routeMetrics.get(route)!;
}
 
// Metrics middleware
function metricsMiddleware(req: Request, res: Response, next: NextFunction) {
  const start = performance.now();
 
  res.on("finish", () => {
    const duration = performance.now() - start;
    const route = req.route?.path || req.path;
    const isError = res.statusCode >= 500;
 
    getRouteMetrics(route).recordRequest(duration, isError);
    getRouteMetrics("_global").recordRequest(duration, isError);
  });
 
  next();
}
 
// Metrics endpoint (scrape this with your monitoring tool)
app.get("/api/metrics", (_req, res) => {
  const metrics: Record<string, ReturnType<RedMetrics["getMetrics"]>> = {};
 
  for (const [route, m] of routeMetrics) {
    metrics[route] = m.getMetrics();
  }
 
  res.json({
    timestamp: new Date().toISOString(),
    metrics,
    process: {
      uptime: process.uptime(),
      memory: process.memoryUsage(),
      cpu: process.cpuUsage(),
    },
  });
});

SLOs and SLIs: Defining "Good Enough"#

Service Level Objectives tell you whether your system is healthy in terms that users care about. Not "CPU is at 40%," but "99.9% of requests complete in under 500ms."

typescript

// SLO monitoring and error budget tracking
 
interface SLO {
  name: string;
  target: number; // e.g., 0.999 = 99.9%
  window: "rolling_7d" | "rolling_30d" | "calendar_month";
  indicator: SLI;
}
 
interface SLI {
  type: "availability" | "latency" | "quality";
  goodEventFn: (req: RequestLog) => boolean;
  validEventFn: (req: RequestLog) => boolean;
}
 
interface RequestLog {
  statusCode: number;
  duration: number;
  path: string;
}
 
class SLOTracker {
  private logs: RequestLog[] = [];
  private readonly maxLogs = 1_000_000;
 
  constructor(private slos: SLO[]) {}
 
  record(log: RequestLog) {
    this.logs.push(log);
    if (this.logs.length > this.maxLogs) {
      // Keep last 7 days
      const cutoff = Date.now() - 7 * 24 * 60 * 60 * 1000;
      this.logs = this.logs.slice(-Math.floor(this.maxLogs / 2));
    }
  }
 
  getStatus(): Array<{
    name: string;
    target: number;
    current: number;
    errorBudgetRemaining: number;
    status: "ok" | "warning" | "critical";
  }> {
    return this.slos.map((slo) => {
      const validEvents = this.logs.filter((log) => slo.indicator.validEventFn(log));
      const goodEvents = validEvents.filter((log) => slo.indicator.goodEventFn(log));
 
      const current = validEvents.length > 0 ? goodEvents.length / validEvents.length : 1;
 
      // Error budget: how many more failures can we tolerate?
      const totalBudget = (1 - slo.target) * validEvents.length;
      const usedBudget = validEvents.length - goodEvents.length;
      const errorBudgetRemaining = totalBudget > 0 ? Math.max(0, (totalBudget - usedBudget) / totalBudget) : 1;
 
      let status: "ok" | "warning" | "critical" = "ok";
      if (errorBudgetRemaining < 0.1) status = "critical";
      else if (errorBudgetRemaining < 0.3) status = "warning";
 
      return {
        name: slo.name,
        target: slo.target,
        current: Math.round(current * 10000) / 10000,
        errorBudgetRemaining: Math.round(errorBudgetRemaining * 1000) / 1000,
        status,
      };
    });
  }
}
 
// Define your SLOs
const sloTracker = new SLOTracker([
  {
    name: "API Availability",
    target: 0.999, // 99.9%
    window: "rolling_30d",
    indicator: {
      type: "availability",
      validEventFn: (req) => !req.path.startsWith("/api/health"),
      goodEventFn: (req) => req.statusCode < 500,
    },
  },
  {
    name: "API Latency (p99 < 500ms)",
    target: 0.99, // 99%
    window: "rolling_7d",
    indicator: {
      type: "latency",
      validEventFn: (req) => req.path.startsWith("/api/"),
      goodEventFn: (req) => req.duration < 500,
    },
  },
  {
    name: "Checkout Success Rate",
    target: 0.995, // 99.5%
    window: "rolling_7d",
    indicator: {
      type: "quality",
      validEventFn: (req) => req.path === "/api/checkout",
      goodEventFn: (req) => req.statusCode < 400,
    },
  },
]);
 
// Alert when error budget is running low
setInterval(() => {
  const statuses = sloTracker.getStatus();
  for (const slo of statuses) {
    if (slo.status === "critical") {
      console.error(
        JSON.stringify({
          level: "error",
          msg: "slo_budget_critical",
          slo: slo.name,
          current: slo.current,
          target: slo.target,
          budgetRemaining: slo.errorBudgetRemaining,
        }),
      );
      // Trigger PagerDuty/Opsgenie/whatever
    } else if (slo.status === "warning") {
      console.warn(
        JSON.stringify({
          level: "warn",
          msg: "slo_budget_warning",
          slo: slo.name,
          current: slo.current,
          target: slo.target,
          budgetRemaining: slo.errorBudgetRemaining,
        }),
      );
    }
  }
}, 60000); // Check every minute

The Mistakes That Kill Startups#

These are real incidents. Names changed, numbers approximate, lessons permanent.

The Unindexed Query That Cost $200K#

A SaaS application had a dashboard that showed "recent activity." The query was:

sql

-- This query ran on every dashboard page load
-- There was no index on (organization_id, created_at)
SELECT *
FROM activity_log
WHERE organization_id = $1
ORDER BY created_at DESC
LIMIT 50;

At 10K rows, it returned in 5ms. At 10M rows, it took 8 seconds. But it did not fail immediately — it degraded slowly. By the time anyone noticed, the database CPU was at 100%, other queries were timing out, and the site was essentially down for 4 hours during US business hours.

The fix was a single line:

sql

CREATE INDEX CONCURRENTLY idx_activity_org_created
  ON activity_log (organization_id, created_at DESC);

Query time: 8 seconds to 2ms. The CONCURRENTLY keyword is important — without it, the CREATE INDEX statement locks the table and blocks all writes for the duration of the index build. On a 10M-row table, that could be 10+ minutes of downtime.

Lesson: Run EXPLAIN ANALYZE on every query that touches a table with more than 10K rows. Set up slow query logging from day one. Do not wait until it hurts.

The Missing Connection Pool#

A Node.js application created a new database connection for every request. At 50 requests per second, this was 50 connections opened and closed every second. Each connection takes ~50ms to establish (TCP handshake, SSL negotiation, authentication). That is 2.5 seconds of connection overhead per second of wall time.

But worse: under load, the application would open more connections than PostgreSQL could handle. Once you hit max_connections, new connection attempts fail, which triggers retries, which creates more connection attempts, which creates a cascading failure.

typescript

// WHAT THE CODE LOOKED LIKE (this is bad, do not do this)
import { Client } from "pg";
 
app.get("/api/data", async (req, res) => {
  // New connection per request — this is the bug
  const client = new Client({ connectionString: process.env.DATABASE_URL });
  await client.connect();
  try {
    const result = await client.query("SELECT * FROM data WHERE id = $1", [req.params.id]);
    res.json(result.rows[0]);
  } finally {
    await client.end(); // Connection destroyed after every request
  }
});
 
// THE FIX: Use a Pool (this is what you should do)
import { Pool } from "pg";
 
// One pool, created at startup, shared across all requests
const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 20, // Maximum 20 connections to the database
  idleTimeoutMillis: 30000,
});
 
app.get("/api/data", async (req, res) => {
  // Pool manages connection lifecycle
  const result = await pool.query("SELECT * FROM data WHERE id = $1", [req.params.id]);
  res.json(result.rows[0]);
});

The Cache Stampede#

An e-commerce site cached its product catalog in Redis with a 5-minute TTL. When the cache expired, the next request would hit the database and repopulate the cache. This worked perfectly at low traffic.

At 10K requests per second, here is what happened when the cache expired:

Request 1 checks cache: miss
Request 2 checks cache: miss (request 1 has not finished loading yet)
Request 3 checks cache: miss
...
Request 500 checks cache: miss (all within the same 200ms)
500 simultaneous database queries for the same data
Database CPU spikes to 100%
Database connections exhausted
All 500 queries timeout
None of them populate the cache
The next 500 requests also miss the cache
Repeat until someone pages the on-call engineer

The fix is the stampede protection pattern shown earlier in this article — use a distributed lock so only one process loads from the database, and everyone else waits for the result. Here it is distilled:

typescript

// The stampede protection pattern in its simplest form
 
async function getWithStampedeProtection<T>(key: string, loader: () => Promise<T>, ttlSeconds: number): Promise<T> {
  // Check cache first
  const cached = await redis.get(key);
  if (cached) return JSON.parse(cached);
 
  // Try to acquire lock
  const lockKey = `lock:${key}`;
  const locked = await redis.set(lockKey, "1", "EX", 30, "NX");
 
  if (locked) {
    // We got the lock — load from source
    try {
      const value = await loader();
      await redis.set(key, JSON.stringify(value), "EX", ttlSeconds);
      return value;
    } finally {
      await redis.del(lockKey);
    }
  } else {
    // Someone else is loading — wait and retry
    await new Promise((r) => setTimeout(r, 200));
    const retryCache = await redis.get(key);
    if (retryCache) return JSON.parse(retryCache);
    // If still no cache, load anyway (lock holder might have failed)
    return loader();
  }
}

The Memory Leak That Only Shows Up at Scale#

Node.js applications have a special class of memory leak: event listener accumulation. Every time you add an event listener and forget to remove it, you leak a small amount of memory. At 10 requests per second, you might leak 1KB per second. You will not notice for days.

typescript

// THE MEMORY LEAK (extremely common in WebSocket/SSE code)
 
app.get("/api/stream", (req, res) => {
  res.writeHead(200, { "Content-Type": "text/event-stream" });
 
  // THIS IS THE LEAK
  // Every new SSE connection adds a listener that is never removed
  eventEmitter.on("update", (data) => {
    res.write(`data: ${JSON.stringify(data)}\n\n`);
  });
 
  // When the client disconnects, the listener stays forever
  // After 10K connections, you have 10K orphaned listeners
  // Each one references the entire closure scope
});
 
// THE FIX: Always clean up listeners
app.get("/api/stream", (req, res) => {
  res.writeHead(200, { "Content-Type": "text/event-stream" });
 
  const handler = (data: unknown) => {
    res.write(`data: ${JSON.stringify(data)}\n\n`);
  };
 
  eventEmitter.on("update", handler);
 
  // Clean up when connection closes
  req.on("close", () => {
    eventEmitter.removeListener("update", handler);
  });
 
  // Also handle errors
  req.on("error", () => {
    eventEmitter.removeListener("update", handler);
  });
});
 
// MONITORING: Track event listener counts
setInterval(() => {
  const listenerCounts: Record<string, number> = {};
  for (const event of eventEmitter.eventNames()) {
    listenerCounts[String(event)] = eventEmitter.listenerCount(event);
  }
 
  // Alert if any event has too many listeners
  for (const [event, count] of Object.entries(listenerCounts)) {
    if (count > 100) {
      console.error(
        JSON.stringify({
          level: "error",
          msg: "listener_leak_suspected",
          event,
          count,
        }),
      );
    }
  }
}, 30000);

The Retry Storm#

A microservices architecture where Service A calls Service B, which calls Service C. Each service has a 3-retry policy with no backoff.

Service C goes down. Service B retries 3 times per request. Service A retries 3 times per request. Each of Service A's retries triggers Service B, which retries 3 times.

Result: 1 original request turns into 1 _ 3 _ 3 = 9 requests to Service C. With 1000 requests per second to Service A, Service C receives 9000 requests per second — exactly when it is least able to handle them.

typescript

// The retry strategy that does not create storms
 
interface RetryConfig {
  maxRetries: number;
  baseDelayMs: number;
  maxDelayMs: number;
  jitterFactor: number; // 0 to 1
  retryableStatuses: number[];
}
 
async function fetchWithRetry(
  url: string,
  options: RequestInit,
  config: RetryConfig = {
    maxRetries: 3,
    baseDelayMs: 1000,
    maxDelayMs: 30000,
    jitterFactor: 0.5,
    retryableStatuses: [429, 502, 503, 504],
  },
): Promise<Response> {
  let lastError: Error | null = null;
 
  for (let attempt = 0; attempt <= config.maxRetries; attempt++) {
    try {
      const controller = new AbortController();
      const timeout = setTimeout(
        () => controller.abort(),
        10000, // 10s timeout per attempt
      );
 
      const response = await fetch(url, {
        ...options,
        signal: controller.signal,
      });
 
      clearTimeout(timeout);
 
      // Check if we should retry based on status code
      if (config.retryableStatuses.includes(response.status) && attempt < config.maxRetries) {
        // Respect Retry-After header if present
        const retryAfter = response.headers.get("Retry-After");
        if (retryAfter) {
          const waitMs = Number(retryAfter) * 1000;
          await new Promise((r) => setTimeout(r, Math.min(waitMs, config.maxDelayMs)));
          continue;
        }
 
        await exponentialBackoff(attempt, config);
        continue;
      }
 
      return response;
    } catch (error) {
      lastError = error as Error;
 
      if (attempt < config.maxRetries) {
        await exponentialBackoff(attempt, config);
      }
    }
  }
 
  throw lastError || new Error(`Failed after ${config.maxRetries} retries`);
}
 
function exponentialBackoff(attempt: number, config: RetryConfig): Promise<void> {
  // Exponential backoff: 1s, 2s, 4s, 8s, ...
  const exponentialDelay = config.baseDelayMs * Math.pow(2, attempt);
  const cappedDelay = Math.min(exponentialDelay, config.maxDelayMs);
 
  // Add jitter to prevent thundering herd
  // Without jitter, all retries happen at the same time
  const jitter = cappedDelay * config.jitterFactor * Math.random();
  const delay = cappedDelay + jitter;
 
  return new Promise((resolve) => setTimeout(resolve, delay));
}
 
// Circuit breaker: stop calling a service that is clearly down
 
enum CircuitState {
  CLOSED = "closed", // Normal operation
  OPEN = "open", // Service is down, fail fast
  HALF_OPEN = "half_open", // Testing if service recovered
}
 
class CircuitBreaker {
  private state: CircuitState = CircuitState.CLOSED;
  private failures = 0;
  private lastFailureTime = 0;
  private successesInHalfOpen = 0;
 
  constructor(
    private readonly failureThreshold: number = 5,
    private readonly resetTimeoutMs: number = 30000,
    private readonly halfOpenSuccesses: number = 3,
  ) {}
 
  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === CircuitState.OPEN) {
      // Check if enough time has passed to try again
      if (Date.now() - this.lastFailureTime > this.resetTimeoutMs) {
        this.state = CircuitState.HALF_OPEN;
        this.successesInHalfOpen = 0;
      } else {
        throw new Error("Circuit breaker is open — service unavailable");
      }
    }
 
    try {
      const result = await fn();
 
      if (this.state === CircuitState.HALF_OPEN) {
        this.successesInHalfOpen++;
        if (this.successesInHalfOpen >= this.halfOpenSuccesses) {
          this.state = CircuitState.CLOSED;
          this.failures = 0;
        }
      } else {
        this.failures = 0;
      }
 
      return result;
    } catch (error) {
      this.failures++;
      this.lastFailureTime = Date.now();
 
      if (this.failures >= this.failureThreshold) {
        this.state = CircuitState.OPEN;
        console.error(
          JSON.stringify({
            level: "error",
            msg: "circuit_breaker_opened",
            failures: this.failures,
          }),
        );
      }
 
      throw error;
    }
  }
 
  getState(): CircuitState {
    return this.state;
  }
}
 
// Usage
const paymentServiceBreaker = new CircuitBreaker(5, 30000, 3);
 
async function chargeCustomer(amount: number): Promise<ChargeResult> {
  return paymentServiceBreaker.execute(async () => {
    const response = await fetchWithRetry("https://payment-service.internal/charge", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ amount }),
    });
    return response.json();
  });
}

Cost Optimization: Spend Money Where It Matters#

The biggest cost trap in scaling is spending too early on things you do not need yet, or spending too little on things that are actively losing you customers.

The Cost Hierarchy#

Here is roughly what a 1M-user application costs per month, in descending order of spend:

typescript

// Typical monthly costs at different scales
// These are real numbers from applications I have worked on
 
const costBreakdown = {
  "10K_users": {
    compute: "$50-200", // Single VPS or small cloud instance
    database: "$50-100", // Managed PostgreSQL, small instance
    cache: "$0-30", // Redis on same server or small instance
    cdn: "$0-10", // Free tier usually sufficient
    monitoring: "$0-50", // Free tier of Datadog/New Relic
    totalMonthly: "$100-400",
    perUserMonthly: "$0.01-0.04",
  },
  "100K_users": {
    compute: "$500-2000", // 2-4 app servers
    database: "$200-800", // Read replicas, bigger instance
    cache: "$100-300", // Dedicated Redis cluster
    cdn: "$50-200", // Significant bandwidth
    monitoring: "$200-500", // Proper APM
    backgroundJobs: "$100-300", // Dedicated worker servers
    totalMonthly: "$1000-4000",
    perUserMonthly: "$0.01-0.04",
  },
  "1M_users": {
    compute: "$3000-15000", // 10-30 app servers, auto-scaling
    database: "$2000-8000", // Multi-replica, connection pooling
    cache: "$500-2000", // Redis cluster with replication
    cdn: "$500-3000", // Multi-TB bandwidth
    monitoring: "$1000-3000", // Full observability stack
    backgroundJobs: "$500-2000",
    search: "$500-2000", // Dedicated search cluster
    totalMonthly: "$8000-35000",
    perUserMonthly: "$0.008-0.035",
  },
};
 
// THE KEY INSIGHT: Cost per user should DECREASE as you scale
// If it's increasing, something is architecturally wrong

Right-Sizing: Stop Paying for Idle Resources#

typescript

// Auto-scaling configuration based on real metrics
// This is the logic — implement via your cloud provider's auto-scaling API
 
interface ScalingPolicy {
  metric: string;
  scaleUpThreshold: number;
  scaleDownThreshold: number;
  scaleUpCooldownMs: number;
  scaleDownCooldownMs: number;
  minInstances: number;
  maxInstances: number;
}
 
class AutoScaler {
  private lastScaleUp = 0;
  private lastScaleDown = 0;
  private currentInstances: number;
 
  constructor(private policy: ScalingPolicy) {
    this.currentInstances = policy.minInstances;
  }
 
  evaluate(currentMetricValue: number): "scale_up" | "scale_down" | "no_change" {
    const now = Date.now();
 
    // Scale up: fast (react to load spikes quickly)
    if (
      currentMetricValue > this.policy.scaleUpThreshold &&
      this.currentInstances < this.policy.maxInstances &&
      now - this.lastScaleUp > this.policy.scaleUpCooldownMs
    ) {
      this.currentInstances++;
      this.lastScaleUp = now;
      console.log(
        JSON.stringify({
          level: "info",
          msg: "scaling_up",
          instances: this.currentInstances,
          metric: currentMetricValue,
        }),
      );
      return "scale_up";
    }
 
    // Scale down: slow (avoid flapping)
    if (
      currentMetricValue < this.policy.scaleDownThreshold &&
      this.currentInstances > this.policy.minInstances &&
      now - this.lastScaleDown > this.policy.scaleDownCooldownMs
    ) {
      this.currentInstances--;
      this.lastScaleDown = now;
      console.log(
        JSON.stringify({
          level: "info",
          msg: "scaling_down",
          instances: this.currentInstances,
          metric: currentMetricValue,
        }),
      );
      return "scale_down";
    }
 
    return "no_change";
  }
}
 
// Production scaling policy
const scaler = new AutoScaler({
  metric: "cpu_utilization",
  scaleUpThreshold: 70, // Scale up at 70% CPU
  scaleDownThreshold: 30, // Scale down at 30% CPU
  scaleUpCooldownMs: 60000, // Wait 1 min between scale-ups
  scaleDownCooldownMs: 300000, // Wait 5 min between scale-downs (slow!)
  minInstances: 2, // Always at least 2 for redundancy
  maxInstances: 20,
});
 
// COST OPTIMIZATION TIPS (from expensive experience):
//
// 1. Use spot/preemptible instances for workers (60-90% cheaper)
//    But NOT for your API servers (getting terminated mid-request is bad)
//
// 2. Reserved instances for baseline capacity (30-50% cheaper)
//    Spot for burst capacity
//
// 3. Database: Start with the biggest instance you can afford
//    Scaling up a database is easy. Scaling it down is terrifying.
//    A $400/month database instance is cheap insurance.
//
// 4. CDN: Cache aggressively. Every request the CDN serves is a request
//    your servers don't have to handle. At $0.01/10K requests on origin
//    vs $0.001/10K on CDN, the math is obvious.
//
// 5. Background jobs: Batch operations. Instead of sending 1000 individual
//    emails, send 1 batch request to your email provider. Instead of
//    1000 individual database inserts, use COPY or multi-row INSERT.

The Real Cost of Technical Debt#

typescript

// A framework for estimating the cost of technical debt
// Use this to justify refactoring work to your manager
 
interface TechDebtItem {
  description: string;
  weeklyEngineerHoursCost: number; // Hours spent working around it
  incidentRiskPerMonth: number; // Probability of causing an incident
  avgIncidentCostDollars: number; // Revenue loss + engineer time per incident
  fixEffortHours: number; // Hours to fix properly
}
 
function calculateDebtROI(item: TechDebtItem): {
  monthlyOngoingCost: number;
  expectedMonthlyIncidentCost: number;
  totalMonthlyCost: number;
  fixCost: number;
  paybackMonths: number;
} {
  const engineerHourlyCost = 100; // Loaded cost (salary + benefits + overhead)
  const monthlyOngoingCost = item.weeklyEngineerHoursCost * 4 * engineerHourlyCost;
  const expectedMonthlyIncidentCost = item.incidentRiskPerMonth * item.avgIncidentCostDollars;
  const totalMonthlyCost = monthlyOngoingCost + expectedMonthlyIncidentCost;
  const fixCost = item.fixEffortHours * engineerHourlyCost;
  const paybackMonths = fixCost / totalMonthlyCost;
 
  return {
    monthlyOngoingCost,
    expectedMonthlyIncidentCost,
    totalMonthlyCost,
    fixCost,
    paybackMonths,
  };
}
 
// Example: The unindexed query from the war story
const debtExample = calculateDebtROI({
  description: "Missing index on activity_log causing slow dashboard",
  weeklyEngineerHoursCost: 2, // 2 hours/week dealing with timeouts
  incidentRiskPerMonth: 0.3, // 30% chance of full outage per month
  avgIncidentCostDollars: 50000, // Lost revenue + 4 engineers for 4 hours
  fixEffortHours: 1, // Adding an index takes 1 hour
});
 
// Result:
// monthlyOngoingCost: $800
// expectedMonthlyIncidentCost: $15,000
// totalMonthlyCost: $15,800
// fixCost: $100
// paybackMonths: 0.006 (pays for itself in 4 hours)
//
// And yet, somehow, this fix sat in the backlog for 3 months
// because "we don't have time for tech debt right now"

The Decision Framework: When to Do What#

After going through three scaling journeys, I have developed a framework for making infrastructure decisions. The core principle is: add complexity only when the pain of not having it exceeds the cost of maintaining it.

Scale Up vs Scale Out#

SCALE UP (bigger server) when:
  ├── You have < 10K users
  ├── Your bottleneck is CPU or memory, not I/O
  ├── Your team is small (< 5 engineers)
  ├── You don't need high availability yet
  └── Cost: $200 -> $800/month is acceptable

SCALE OUT (more servers) when:
  ├── You need zero-downtime deployments
  ├── A single server cannot handle the load
  ├── You need geographic distribution
  ├── You need fault tolerance (server failure = no impact)
  └── You have the operational maturity to manage it

ADD A CACHE when:
  ├── The same data is read >10x more than it's written
  ├── Response times are unacceptable and DB is the bottleneck
  ├── You can tolerate slightly stale data (even 5 seconds)
  └── NOT when: cache invalidation complexity exceeds the benefit

ADD A MESSAGE QUEUE when:
  ├── You have operations that take >1 second
  ├── You need guaranteed delivery (email, webhooks)
  ├── You need to decouple producers from consumers
  ├── You need rate limiting on downstream services
  └── NOT when: synchronous processing is fast enough

SPLIT A MONOLITH when:
  ├── Different parts need to scale independently
  ├── Different parts have different deployment velocities
  ├── Teams are stepping on each other's toes
  ├── A failure in one area brings down everything
  └── NOT when: your team is < 10 people
  └── NOT when: you're doing it because microservices are cool

The Decision Matrix#

typescript

// A practical tool for deciding when to add infrastructure
 
interface Decision {
  question: string;
  thresholds: {
    doNothing: string;
    simple: string;
    complex: string;
  };
}
 
const decisions: Decision[] = [
  {
    question: "Do you need a load balancer?",
    thresholds: {
      doNothing: "< 1000 RPM, single server is fine",
      simple: "1K-10K RPM: Nginx reverse proxy, 2-3 servers",
      complex: "> 10K RPM: L7 load balancer with health checks, auto-scaling",
    },
  },
  {
    question: "Do you need caching?",
    thresholds: {
      doNothing: "All queries < 50ms, no repeated reads",
      simple: "Add Redis cache-aside for hot data, TTL-based invalidation",
      complex: "Multi-layer caching, cache warming, event-driven invalidation",
    },
  },
  {
    question: "Do you need read replicas?",
    thresholds: {
      doNothing: "Database CPU < 60%, query times acceptable",
      simple: "1 read replica, manual query routing",
      complex: "Multiple replicas, automated failover, geographic distribution",
    },
  },
  {
    question: "Do you need background jobs?",
    thresholds: {
      doNothing: "All operations complete < 200ms",
      simple: "Basic queue for emails and webhooks",
      complex: "Priority queues, DLQ, idempotency, rate limiting, scheduled jobs",
    },
  },
  {
    question: "Do you need to break up the monolith?",
    thresholds: {
      doNothing: "< 5 developers, single deployable unit is manageable",
      simple: "Extract 1-2 high-traffic services",
      complex: "Full microservices with service mesh and distributed tracing",
    },
  },
];

The Deployment Pipeline: From Code to Production#

None of the above matters if you cannot deploy reliably. Here is the pipeline that has served me well at every scale:

typescript

// CI/CD pipeline stages — implement these in GitHub Actions, GitLab CI,
// or whatever you use
 
interface PipelineStage {
  name: string;
  duration: string;
  description: string;
}
 
const pipeline: PipelineStage[] = [
  {
    name: "lint-and-typecheck",
    duration: "1-2 min",
    description: "ESLint + TypeScript compiler. Catches 80% of bugs.",
  },
  {
    name: "unit-tests",
    duration: "2-5 min",
    description: "Fast tests. Mocked dependencies. Run on every push.",
  },
  {
    name: "integration-tests",
    duration: "5-15 min",
    description: "Real database, real Redis. Run on PRs and main.",
  },
  {
    name: "build",
    duration: "3-10 min",
    description: "Build the production artifact. If this fails, nothing deploys.",
  },
  {
    name: "canary-deploy",
    duration: "5-15 min",
    description: "Deploy to 1 server. Route 5% of traffic. Monitor error rate.",
  },
  {
    name: "full-deploy",
    duration: "5-15 min",
    description: "Rolling deployment to all servers. Health checks between batches.",
  },
  {
    name: "smoke-tests",
    duration: "1-2 min",
    description: "Hit critical endpoints in production. Verify they respond.",
  },
];
 
// Canary deployment logic
async function canaryDeploy(config: {
  newVersion: string;
  canaryPercentage: number;
  monitorDurationMs: number;
  errorRateThreshold: number;
}): Promise<boolean> {
  // Deploy to canary servers
  console.log(`Deploying ${config.newVersion} to canary (${config.canaryPercentage}% traffic)`);
 
  // Wait and monitor
  const startTime = Date.now();
  while (Date.now() - startTime < config.monitorDurationMs) {
    await new Promise((r) => setTimeout(r, 10000)); // Check every 10 seconds
 
    const metrics = getRouteMetrics("_global").getMetrics();
 
    if (metrics.errorRate > config.errorRateThreshold) {
      console.error(
        JSON.stringify({
          level: "error",
          msg: "canary_rollback",
          errorRate: metrics.errorRate,
          threshold: config.errorRateThreshold,
          version: config.newVersion,
        }),
      );
      // Rollback canary
      return false;
    }
 
    if (metrics.p99 > 2000) {
      // p99 > 2 seconds
      console.error(
        JSON.stringify({
          level: "error",
          msg: "canary_rollback_latency",
          p99: metrics.p99,
          version: config.newVersion,
        }),
      );
      return false;
    }
  }
 
  console.log(`Canary passed. Proceeding with full deploy.`);
  return true;
}
 
// Zero-downtime rolling deployment
async function rollingDeploy(servers: string[], newVersion: string, batchSize: number = 1): Promise<void> {
  for (let i = 0; i < servers.length; i += batchSize) {
    const batch = servers.slice(i, i + batchSize);
 
    for (const server of batch) {
      // 1. Remove from load balancer
      await removeFromLoadBalancer(server);
      // 2. Wait for in-flight requests to drain
      await waitForDrain(server, 30000);
      // 3. Deploy new version
      await deployToServer(server, newVersion);
      // 4. Health check
      const healthy = await waitForHealthy(server, 60000);
      if (!healthy) {
        throw new Error(`Server ${server} failed health check after deploy`);
      }
      // 5. Add back to load balancer
      await addToLoadBalancer(server);
    }
 
    // Wait between batches to catch issues before continuing
    if (i + batchSize < servers.length) {
      console.log("Waiting 30s between deployment batches...");
      await new Promise((r) => setTimeout(r, 30000));
    }
  }
}
 
async function removeFromLoadBalancer(_server: string): Promise<void> {
  // Implementation depends on your LB (Nginx, HAProxy, cloud LB)
}
 
async function addToLoadBalancer(_server: string): Promise<void> {
  // Implementation depends on your LB
}
 
async function waitForDrain(server: string, timeoutMs: number): Promise<void> {
  const start = Date.now();
  while (Date.now() - start < timeoutMs) {
    const activeConns = await getActiveConnections(server);
    if (activeConns === 0) return;
    await new Promise((r) => setTimeout(r, 1000));
  }
}
 
async function waitForHealthy(server: string, timeoutMs: number): Promise<boolean> {
  const start = Date.now();
  while (Date.now() - start < timeoutMs) {
    try {
      const response = await fetch(`http://${server}:3000/api/health`);
      if (response.ok) return true;
    } catch {
      // Not ready yet
    }
    await new Promise((r) => setTimeout(r, 2000));
  }
  return false;
}
 
async function deployToServer(_server: string, _version: string): Promise<void> {
  // Pull new image, restart process, etc.
}
 
async function getActiveConnections(_server: string): Promise<number> {
  // Query your metrics endpoint
  return 0;
}

The Final Checklist#

Before you go to sleep tonight, ask yourself these questions about your production system:

RELIABILITY
□ Can you deploy without downtime?
□ Can a single server die without user impact?
□ Do you have automated database backups tested with restore drills?
□ Is your secret rotation automated (or at least documented)?
□ Do you have runbooks for common incidents?

PERFORMANCE
□ Do you know your p95 and p99 latency?
□ Have you run EXPLAIN ANALYZE on your top 10 queries?
□ Is your database connection pool properly sized?
□ Are your most-read data paths cached?
□ Are static assets served from a CDN?

OBSERVABILITY
□ Can you answer "what changed?" when something breaks?
□ Do you have alerts for error rate spikes?
□ Can you trace a request from the load balancer to the database?
□ Are your logs structured and queryable?
□ Do you know your SLO compliance numbers?

SECURITY (often forgotten until it's too late)
□ Are all dependencies up to date?
□ Is SQL injection impossible (parameterized queries everywhere)?
□ Are rate limits in place on all public endpoints?
□ Is sensitive data encrypted at rest and in transit?
□ Do you have a security incident response plan?

COST
□ Do you know your cost per user?
□ Are you paying for resources you're not using?
□ Are spot/preemptible instances used for non-critical workloads?
□ Is your CDN hit rate above 90% for static assets?
□ Are old logs, backups, and data being pruned?

Closing Thoughts#

Scaling is not a technical problem. It is an organizational problem with technical symptoms.

The team that scales successfully is not the one with the best technology stack. It is the one that monitors aggressively, deploys confidently, pays down technical debt before it becomes an emergency, and makes decisions based on data rather than anxiety.

Every recommendation in this guide comes from making the wrong choice first. I have been the engineer who shipped the unindexed query, forgot the connection pool, ignored the memory leak, and deployed without canary testing. Each mistake taught me more than any blog post could.

The irony of a blog post about scaling is that most of you do not need most of this right now. If you have fewer than 10K users, focus on the product. The single best optimization is having users who care enough to break your system.

But when they do, come back to this guide. It will be here.

Build for 10x your current load. Plan for 100x. Worry about 1000x when you actually get there.

The Scaling Journey: What Breaks and When#

0-100 Users: The "Everything Works" Phase#

100-1K Users: The First Real Problems#

1K-10K Users: You Need a Real Architecture#

10K-100K Users: The Architecture Phase Transition#

100K-1M Users: You Are an Infrastructure Company Now#

Database Scaling: The Foundation of Everything#

Connection Pooling: The First Thing That Breaks#

The N+1 Problem: Death by a Thousand Queries#

Indexing Strategy: Think Like the Query Planner#

Table Partitioning: When Your Tables Get Too Big#

Materialized Views: Precomputing Expensive Queries#

Caching Architecture: The Layer Between You and Disaster#

Cache-Aside Pattern: The Workhorse#

Multi-Layer Caching Strategy#

Cache Invalidation: The Two Hard Problems#

Load Balancing and Horizontal Scaling#

Nginx as a Load Balancer#

Graceful Shutdown: The Difference Between "Deploying" and "Breaking Things"#

Health Checks: Letting the Load Balancer Know You Are Alive#

Background Jobs and Message Queues#

Building a Production Job Queue#

Idempotency: Because Things Will Be Processed Twice#

CDN and Edge Computing#

Static Asset Optimization Pipeline#

Edge Caching with Surrogate Keys#

Monitoring and Observability at Scale#

Structured Logging That Actually Helps#

The RED Method: What to Measure#

SLOs and SLIs: Defining "Good Enough"#

The Mistakes That Kill Startups#

The Unindexed Query That Cost $200K#

The Missing Connection Pool#

The Cache Stampede#

The Memory Leak That Only Shows Up at Scale#

The Retry Storm#

Cost Optimization: Spend Money Where It Matters#

The Cost Hierarchy#

Right-Sizing: Stop Paying for Idle Resources#

The Real Cost of Technical Debt#

The Decision Framework: When to Do What#

Scale Up vs Scale Out#

The Decision Matrix#

The Deployment Pipeline: From Code to Production#

The Final Checklist#

Closing Thoughts#

Artikel Terkait

YAML Validator Guide: Catch Config Errors Before Deploy

YAML Validator Guide for Configuration Debugging