The definitive guide to scaling a web application from 0 to 1M+ users. Connection pooling, caching layers, load balancing, background jobs, edge computing, monitoring, and the real war stories that nobody puts in their blog posts.
Your application works perfectly on your laptop. It handles your demo beautifully. Your investor pitch is flawless. Then real users show up, and everything you thought you knew about software engineering turns out to be a fairy tale you told yourself while localhost returned responses in 3ms.
I have scaled three different applications from near-zero to over a million active users. Each time, I thought I knew what was coming. Each time, the system found new and creative ways to prove me wrong. The bottlenecks are never where you expect them. The failures are never what you planned for. And the solutions that work at 10K users will actively hurt you at 100K.
This is the playbook I wish someone had given me before the first 3 AM wake-up call. Every code example is from production. Every war story actually happened. Every recommendation comes from painful experience, not from reading somebody else's blog post about "web scale."
Scaling is not a continuous process. It is a series of phase transitions. Your application works fine, works fine, works fine, and then suddenly it does not. Understanding these breakpoints saves you months of reactive firefighting.
At this stage, your single server handles everything. PostgreSQL, your application, Redis, maybe even Nginx — all on the same box. Response times are fast because there is no contention. The database returns queries in microseconds because the entire dataset fits in the buffer cache. You feel smart.
The only thing that matters here: do not add complexity you do not need yet. I have seen teams set up Kubernetes clusters for 50 users. They spent three months on infrastructure and zero months on product. Then they shut down because nobody wanted the product.
// At 100 users, this is your entire infrastructure
// One server, one database, one process
import express from 'express';
import { Pool } from 'pg';
const pool = new Pool({
host: 'localhost', // yes, localhost. the db is on the same box
max: 10, // 10 connections is plenty
idleTimeoutMillis: 30000,
});
const app = express();
app.get('/api/users/:id', async (req, res) => {
const { rows } = await pool.query(
'SELECT id, name, email FROM users WHERE id = $1',
[req.params.id]
);
res.json(rows[0]);
});
app.listen(3000);
// This handles 100 users without breaking a sweatThis is where you discover that the queries you wrote during prototyping are garbage. Table scans on 50K rows. Missing indexes on foreign keys. N+1 queries in every list endpoint. Your response times go from 50ms to 2 seconds and you have no idea why because you do not have monitoring.
What breaks:
What to fix:
// The query that kills you at 1K users
// This is what every ORM generates by default
// BAD: Full table scan, no pagination
app.get('/api/posts', async (req, res) => {
const posts = await db.query('SELECT * FROM posts ORDER BY created_at DESC');
// You are loading every post into memory
// At 50K posts, this takes 3 seconds and uses 200MB of RAM
res.json(posts.rows);
});
// GOOD: Indexed, paginated, selecting only what you need
app.get('/api/posts', async (req, res) => {
const cursor = req.query.cursor;
const limit = Math.min(Number(req.query.limit) || 20, 100);
const { rows } = await db.query(
`SELECT id, title, excerpt, author_id, created_at
FROM posts
WHERE ($1::timestamptz IS NULL OR created_at < $1)
ORDER BY created_at DESC
LIMIT $2`,
[cursor || null, limit + 1]
);
const hasMore = rows.length > limit;
const posts = hasMore ? rows.slice(0, -1) : rows;
res.json({
posts,
nextCursor: hasMore ? posts[posts.length - 1].created_at : null,
});
});
// And the index that makes it work
// CREATE INDEX idx_posts_created_at ON posts (created_at DESC);Your single server is running hot. CPU is at 80% during peak hours. The database connection pool is constantly maxed out. You are getting occasional timeouts. Users complain that "the site is slow sometimes."
What breaks:
What to fix:
// At this stage, you need proper connection pooling
// This is not optional — it is the difference between working and not
import { Pool } from 'pg';
const pool = new Pool({
host: process.env.DB_HOST, // separate DB server now
port: 5432,
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
max: 20, // per-process limit
min: 5, // keep connections warm
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 5000, // fail fast if pool exhausted
statement_timeout: '10000', // kill queries over 10 seconds
// THIS IS THE ONE EVERYONE FORGETS
// Without it, a single slow query can starve the pool
query_timeout: 15000,
});
// Monitor your pool — you'll thank me later
setInterval(() => {
const { totalCount, idleCount, waitingCount } = pool;
console.log(JSON.stringify({
level: 'info',
msg: 'pool_stats',
total: totalCount,
idle: idleCount,
waiting: waitingCount,
active: totalCount - idleCount,
}));
if (waitingCount > 5) {
console.log(JSON.stringify({
level: 'warn',
msg: 'pool_pressure',
waiting: waitingCount,
}));
}
}, 10000);This is where amateur hour ends. Everything that was "good enough" stops being good enough. Your database needs read replicas. Your caching needs to be strategic, not opportunistic. Your deployments need to be zero-downtime. You need real monitoring, not "I'll check the logs when something breaks."
What breaks:
What to fix:
// Database query routing between primary and replicas
// This is the pattern that lets you scale reads independently
import { Pool } from 'pg';
interface DatabaseCluster {
primary: Pool;
replicas: Pool[];
currentReplica: number;
}
function createCluster(): DatabaseCluster {
const primary = new Pool({
host: process.env.DB_PRIMARY_HOST,
max: 20,
min: 5,
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 5000,
});
const replicaHosts = (process.env.DB_REPLICA_HOSTS || '').split(',');
const replicas = replicaHosts.map(host =>
new Pool({
host,
max: 30, // replicas handle more connections (read-heavy)
min: 5,
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 5000,
})
);
return { primary, replicas, currentReplica: 0 };
}
const cluster = createCluster();
// Round-robin replica selection
function getReadPool(): Pool {
if (cluster.replicas.length === 0) return cluster.primary;
const pool = cluster.replicas[cluster.currentReplica];
cluster.currentReplica = (cluster.currentReplica + 1) % cluster.replicas.length;
return pool;
}
// The key insight: route by operation type
export async function query(
sql: string,
params?: unknown[],
options?: { forcePrimary?: boolean }
) {
// Writes always go to primary
// Reads go to replicas unless you need strong consistency
const isWrite = /^\s*(INSERT|UPDATE|DELETE|CREATE|ALTER|DROP)/i.test(sql);
const pool = isWrite || options?.forcePrimary
? cluster.primary
: getReadPool();
const start = performance.now();
try {
const result = await pool.query(sql, params);
const duration = performance.now() - start;
if (duration > 1000) {
console.log(JSON.stringify({
level: 'warn',
msg: 'slow_query',
sql: sql.substring(0, 200),
duration_ms: Math.round(duration),
target: isWrite ? 'primary' : 'replica',
}));
}
return result;
} catch (error) {
// If a replica fails, fall back to primary
if (!isWrite && !options?.forcePrimary) {
console.log(JSON.stringify({
level: 'warn',
msg: 'replica_fallback',
error: (error as Error).message,
}));
return cluster.primary.query(sql, params);
}
throw error;
}
}Congratulations, you are no longer building a product. You are operating a distributed system, and distributed systems have failure modes that will make you question your career choices.
What breaks:
What to fix:
I am going to be direct: 90% of scaling problems are database problems in disguise. Your application is stateless (or should be). Your cache is ephemeral. But your database is where truth lives, and truth is expensive to compute, dangerous to duplicate, and devastating to lose.
Every database connection consumes memory on the database server — typically 5-10MB per connection for PostgreSQL. With 10 application servers each maintaining a pool of 20 connections, you have 200 connections. PostgreSQL's default max_connections is 100. You do the math.
The solution is an external connection pooler. It sits between your application and your database, multiplexing hundreds of application connections onto a smaller number of database connections.
-- PostgreSQL configuration for a server handling 100K+ users
-- These are the settings that actually matter
-- Connection limits
max_connections = 300 -- Don't go higher; use a pooler instead
superuser_reserved_connections = 3
-- Memory (for a 16GB RAM server)
shared_buffers = '4GB' -- 25% of RAM
effective_cache_size = '12GB' -- 75% of RAM
work_mem = '64MB' -- Per-operation sort/hash memory
maintenance_work_mem = '512MB' -- For VACUUM, CREATE INDEX
-- WAL and replication
wal_level = 'replica'
max_wal_senders = 10
max_replication_slots = 10
wal_buffers = '64MB'
checkpoint_completion_target = 0.9
-- Query planner
random_page_cost = 1.1 -- Assuming SSD storage
effective_io_concurrency = 200 -- SSD can handle parallel I/O
default_statistics_target = 200 -- Better query plans, slightly slower ANALYZE
-- Logging (you need this for debugging)
log_min_duration_statement = 500 -- Log queries over 500ms
log_checkpoints = on
log_connections = off -- Too noisy at scale
log_disconnections = off
log_lock_waits = on
log_temp_files = 0 -- Log ALL temp file usageHere is how to implement application-level connection pooling properly:
// Production connection pool with health checking and metrics
import { Pool, PoolClient, QueryResult } from 'pg';
interface PoolMetrics {
totalQueries: number;
failedQueries: number;
slowQueries: number;
poolExhausted: number;
averageQueryTime: number;
}
class DatabasePool {
private pool: Pool;
private metrics: PoolMetrics = {
totalQueries: 0,
failedQueries: 0,
slowQueries: 0,
poolExhausted: 0,
averageQueryTime: 0,
};
private queryTimes: number[] = [];
private readonly SLOW_QUERY_MS = 1000;
private readonly MAX_QUERY_TIME_SAMPLES = 1000;
constructor(config: {
host: string;
port: number;
database: string;
user: string;
password: string;
maxConnections: number;
}) {
this.pool = new Pool({
host: config.host,
port: config.port,
database: config.database,
user: config.user,
password: config.password,
max: config.maxConnections,
min: Math.max(2, Math.floor(config.maxConnections / 4)),
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 5000,
allowExitOnIdle: false,
});
this.pool.on('error', (err) => {
// Idle client errors — connection dropped by server/network
console.error(JSON.stringify({
level: 'error',
msg: 'pool_idle_error',
error: err.message,
}));
});
this.pool.on('connect', () => {
// New connection established — useful for debugging pool behavior
});
}
async query<T = unknown>(
sql: string,
params?: unknown[]
): Promise<QueryResult<T>> {
const start = performance.now();
this.metrics.totalQueries++;
// Check pool pressure before acquiring
if (this.pool.waitingCount > 10) {
this.metrics.poolExhausted++;
console.log(JSON.stringify({
level: 'warn',
msg: 'pool_backpressure',
waiting: this.pool.waitingCount,
sql: sql.substring(0, 100),
}));
}
try {
const result = await this.pool.query<T>(sql, params);
const duration = performance.now() - start;
this.recordQueryTime(duration);
if (duration > this.SLOW_QUERY_MS) {
this.metrics.slowQueries++;
console.log(JSON.stringify({
level: 'warn',
msg: 'slow_query',
duration_ms: Math.round(duration),
sql: sql.substring(0, 200),
rows: result.rowCount,
}));
}
return result;
} catch (error) {
this.metrics.failedQueries++;
const duration = performance.now() - start;
console.error(JSON.stringify({
level: 'error',
msg: 'query_error',
duration_ms: Math.round(duration),
sql: sql.substring(0, 200),
error: (error as Error).message,
}));
throw error;
}
}
// Transaction support with automatic rollback
async transaction<T>(
fn: (client: PoolClient) => Promise<T>
): Promise<T> {
const client = await this.pool.connect();
try {
await client.query('BEGIN');
const result = await fn(client);
await client.query('COMMIT');
return result;
} catch (error) {
await client.query('ROLLBACK');
throw error;
} finally {
client.release();
}
}
getMetrics(): PoolMetrics {
return {
...this.metrics,
averageQueryTime: this.calculateAvgQueryTime(),
};
}
getPoolStats() {
return {
total: this.pool.totalCount,
idle: this.pool.idleCount,
waiting: this.pool.waitingCount,
active: this.pool.totalCount - this.pool.idleCount,
};
}
private recordQueryTime(duration: number) {
this.queryTimes.push(duration);
if (this.queryTimes.length > this.MAX_QUERY_TIME_SAMPLES) {
this.queryTimes.shift();
}
}
private calculateAvgQueryTime(): number {
if (this.queryTimes.length === 0) return 0;
const sum = this.queryTimes.reduce((a, b) => a + b, 0);
return Math.round(sum / this.queryTimes.length);
}
async healthCheck(): Promise<boolean> {
try {
const result = await this.pool.query('SELECT 1 as health');
return result.rows[0]?.health === 1;
} catch {
return false;
}
}
async shutdown(): Promise<void> {
await this.pool.end();
}
}
export const db = new DatabasePool({
host: process.env.DB_HOST!,
port: Number(process.env.DB_PORT) || 5432,
database: process.env.DB_NAME!,
user: process.env.DB_USER!,
password: process.env.DB_PASSWORD!,
maxConnections: Number(process.env.DB_MAX_CONNECTIONS) || 20,
});The N+1 problem is the single most common performance issue in web applications, and it is almost always introduced by ORMs that make it too easy to be lazy.
// THE N+1 PROBLEM: What most APIs actually do
// A request for "list posts with authors" generates:
// 1 query: SELECT * FROM posts LIMIT 20
// 20 queries: SELECT * FROM users WHERE id = ? (one per post)
// Total: 21 queries for 20 posts
// BAD: The ORM makes this look innocent
async function getPostsWithAuthors() {
const posts = await Post.findAll({ limit: 20 });
// This triggers 20 additional queries behind the scenes
for (const post of posts) {
post.author = await User.findByPk(post.authorId);
}
return posts;
}
// GOOD: One query with a JOIN
async function getPostsWithAuthors(db: DatabasePool) {
const { rows } = await db.query(`
SELECT
p.id,
p.title,
p.excerpt,
p.created_at,
json_build_object(
'id', u.id,
'name', u.name,
'avatar_url', u.avatar_url
) as author
FROM posts p
INNER JOIN users u ON u.id = p.author_id
ORDER BY p.created_at DESC
LIMIT 20
`);
return rows;
}
// EVEN BETTER: Batch loading with DataLoader pattern
// This is what Facebook built for GraphQL, and it works for REST too
class BatchLoader<K, V> {
private batch: Map<K, {
resolve: (value: V | null) => void;
reject: (error: Error) => void;
}[]> = new Map();
private timer: ReturnType<typeof setTimeout> | null = null;
constructor(
private loadFn: (keys: K[]) => Promise<Map<K, V>>
) {}
async load(key: K): Promise<V | null> {
return new Promise((resolve, reject) => {
const existing = this.batch.get(key) || [];
existing.push({ resolve, reject });
this.batch.set(key, existing);
if (!this.timer) {
// Batch all loads within the same tick
this.timer = setTimeout(() => this.executeBatch(), 0);
}
});
}
private async executeBatch() {
const batch = this.batch;
this.batch = new Map();
this.timer = null;
const keys = Array.from(batch.keys());
try {
const results = await this.loadFn(keys);
for (const [key, callbacks] of batch) {
const value = results.get(key) ?? null;
callbacks.forEach(cb => cb.resolve(value));
}
} catch (error) {
for (const callbacks of batch.values()) {
callbacks.forEach(cb => cb.reject(error as Error));
}
}
}
}
// Usage
const userLoader = new BatchLoader<string, User>(async (ids) => {
const { rows } = await db.query(
`SELECT * FROM users WHERE id = ANY($1)`,
[ids]
);
return new Map(rows.map(u => [u.id, u]));
});
// Now 20 calls to userLoader.load(id) result in 1 query
// SELECT * FROM users WHERE id = ANY('{id1,id2,...,id20}')Most developers add indexes reactively — a query is slow, they add an index, it gets faster. That works until you have 50 indexes on a table and your write throughput drops to zero because every INSERT has to update all 50 indexes.
-- INDEXING STRATEGY FOR A SOCIAL MEDIA APP AT SCALE
-- Rule 1: Index what you filter, join, and order by
-- But ONLY those columns, and in the right order
-- Posts table
-- Primary query: "Get recent posts by user"
CREATE INDEX idx_posts_user_created
ON posts (user_id, created_at DESC);
-- This covers: WHERE user_id = X ORDER BY created_at DESC
-- The column order matters! user_id first because it's the equality filter
-- Rule 2: Partial indexes for common filters
-- 90% of queries only want published posts
CREATE INDEX idx_posts_published_recent
ON posts (created_at DESC)
WHERE status = 'published';
-- This index is smaller and faster than indexing all posts
-- Rule 3: Covering indexes to avoid table lookups
-- If a query can be satisfied entirely from the index, PostgreSQL
-- never touches the heap (the actual table data)
CREATE INDEX idx_posts_feed_covering
ON posts (user_id, created_at DESC)
INCLUDE (title, excerpt, like_count);
-- Now "get feed items" never reads the table
-- Rule 4: Expression indexes for computed filters
-- You search by lowercase email everywhere
CREATE UNIQUE INDEX idx_users_email_lower
ON users (LOWER(email));
-- Make sure your queries use LOWER(email) too
-- Rule 5: GIN indexes for full-text search and JSONB
CREATE INDEX idx_posts_search
ON posts USING gin(to_tsvector('english', title || ' ' || body));
-- For JSONB columns
CREATE INDEX idx_user_preferences
ON users USING gin(preferences jsonb_path_ops);
-- ANTI-PATTERNS: Indexes that hurt more than they help
-- BAD: Indexing boolean columns with low selectivity
-- CREATE INDEX idx_posts_active ON posts (is_active);
-- If 95% of posts are active, this index is useless
-- PostgreSQL will do a seq scan anyway because the index isn't selective
-- BAD: Too many indexes on a write-heavy table
-- Every INSERT/UPDATE/DELETE must update ALL indexes
-- 10 indexes on a table that gets 1000 writes/second = 10,000 index updates/second
-- DIAGNOSTIC: Find unused indexes (run periodically)
SELECT
schemaname,
relname AS table_name,
indexrelname AS index_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
idx_scan AS times_used
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE '%_pkey'
AND indexrelname NOT LIKE '%_unique'
ORDER BY pg_relation_size(indexrelid) DESC;
-- DIAGNOSTIC: Find missing indexes (queries doing seq scans)
SELECT
relname AS table_name,
seq_scan,
seq_tup_read,
idx_scan,
CASE WHEN seq_scan > 0
THEN seq_tup_read / seq_scan
ELSE 0
END AS avg_rows_per_scan
FROM pg_stat_user_tables
WHERE seq_scan > 100
ORDER BY seq_tup_read DESC
LIMIT 20;When a single table grows past tens of millions of rows, even indexed queries start slowing down. Partitioning splits one logical table into multiple physical tables, and PostgreSQL transparently routes queries to the right partition.
-- Time-based partitioning for an events/analytics table
-- This is the most common and most useful partitioning strategy
CREATE TABLE events (
id bigserial,
user_id uuid NOT NULL,
event_type text NOT NULL,
payload jsonb,
created_at timestamptz NOT NULL DEFAULT now()
) PARTITION BY RANGE (created_at);
-- Create monthly partitions
CREATE TABLE events_2026_01 PARTITION OF events
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE events_2026_02 PARTITION OF events
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
CREATE TABLE events_2026_03 PARTITION OF events
FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');
-- Indexes on partitions (inherited automatically since PG 11)
CREATE INDEX ON events (user_id, created_at DESC);
CREATE INDEX ON events (event_type, created_at DESC);
-- Auto-create future partitions with a cron job
-- This is the script you run monthly:
DO $$
DECLARE
partition_date date;
partition_name text;
start_date text;
end_date text;
BEGIN
-- Create partitions for the next 3 months
FOR i IN 0..2 LOOP
partition_date := date_trunc('month', now() + (i || ' months')::interval);
partition_name := 'events_' || to_char(partition_date, 'YYYY_MM');
start_date := to_char(partition_date, 'YYYY-MM-DD');
end_date := to_char(partition_date + '1 month'::interval, 'YYYY-MM-DD');
IF NOT EXISTS (
SELECT 1 FROM pg_class WHERE relname = partition_name
) THEN
EXECUTE format(
'CREATE TABLE %I PARTITION OF events FOR VALUES FROM (%L) TO (%L)',
partition_name, start_date, end_date
);
RAISE NOTICE 'Created partition %', partition_name;
END IF;
END LOOP;
END $$;
-- Dropping old data is now instant — just drop the partition
-- Instead of: DELETE FROM events WHERE created_at < '2025-01-01'
-- (which locks the table and takes hours)
-- Do this:
DROP TABLE events_2025_01;
-- Instant. No locks. No vacuum needed.Some queries are inherently expensive. Aggregations across millions of rows, complex joins across multiple tables, analytics dashboards. You cannot make them fast with indexes alone. Materialized views precompute the results and let you query a flat table instead.
-- Dashboard analytics that would take 30 seconds to compute live
CREATE MATERIALIZED VIEW mv_daily_stats AS
SELECT
date_trunc('day', e.created_at) AS day,
COUNT(DISTINCT e.user_id) AS unique_users,
COUNT(*) AS total_events,
COUNT(*) FILTER (WHERE e.event_type = 'purchase') AS purchases,
COALESCE(SUM(
CASE WHEN e.event_type = 'purchase'
THEN (e.payload->>'amount')::numeric
ELSE 0 END
), 0) AS revenue,
COUNT(*) FILTER (WHERE e.event_type = 'signup') AS signups,
ROUND(
COUNT(*) FILTER (WHERE e.event_type = 'purchase')::numeric /
NULLIF(COUNT(DISTINCT e.user_id), 0) * 100, 2
) AS conversion_rate
FROM events e
WHERE e.created_at >= now() - INTERVAL '90 days'
GROUP BY date_trunc('day', e.created_at)
ORDER BY day DESC
WITH DATA;
-- Unique index enables CONCURRENTLY refresh (no locks)
CREATE UNIQUE INDEX ON mv_daily_stats (day);
-- Refresh without blocking readers
-- Run this via cron every 15 minutes
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_stats;
-- Now your dashboard query goes from 30 seconds to 5ms:
SELECT * FROM mv_daily_stats WHERE day >= now() - INTERVAL '30 days';// Automated materialized view refresh with error handling
interface MaterializedViewConfig {
name: string;
refreshIntervalMs: number;
concurrent: boolean;
}
class MaterializedViewRefresher {
private timers: Map<string, ReturnType<typeof setInterval>> = new Map();
constructor(
private db: DatabasePool,
private views: MaterializedViewConfig[]
) {}
start() {
for (const view of this.views) {
const timer = setInterval(
() => this.refresh(view),
view.refreshIntervalMs
);
this.timers.set(view.name, timer);
// Also refresh immediately on start
this.refresh(view).catch(() => {});
}
}
private async refresh(view: MaterializedViewConfig) {
const start = performance.now();
const concurrent = view.concurrent ? 'CONCURRENTLY' : '';
try {
await this.db.query(
`REFRESH MATERIALIZED VIEW ${concurrent} ${view.name}`
);
const duration = performance.now() - start;
console.log(JSON.stringify({
level: 'info',
msg: 'mv_refreshed',
view: view.name,
duration_ms: Math.round(duration),
}));
} catch (error) {
const duration = performance.now() - start;
console.error(JSON.stringify({
level: 'error',
msg: 'mv_refresh_failed',
view: view.name,
duration_ms: Math.round(duration),
error: (error as Error).message,
}));
}
}
stop() {
for (const timer of this.timers.values()) {
clearInterval(timer);
}
this.timers.clear();
}
}
// Usage
const refresher = new MaterializedViewRefresher(db, [
{ name: 'mv_daily_stats', refreshIntervalMs: 15 * 60 * 1000, concurrent: true },
{ name: 'mv_user_activity', refreshIntervalMs: 5 * 60 * 1000, concurrent: true },
{ name: 'mv_popular_posts', refreshIntervalMs: 60 * 1000, concurrent: true },
]);
refresher.start();Caching is easy to add and incredibly hard to get right. The gap between "add a Redis cache" and "implement a caching strategy that does not serve stale data, does not stampede your database, and does not use more memory than your dataset" is enormous.
Cache-aside is the most common caching pattern. The application checks the cache first; on miss, it loads from the database and populates the cache. Simple, but the devil is in the details.
import Redis from 'ioredis';
const redis = new Redis({
host: process.env.REDIS_HOST,
port: Number(process.env.REDIS_PORT) || 6379,
password: process.env.REDIS_PASSWORD,
maxRetriesPerRequest: 3,
retryStrategy(times) {
const delay = Math.min(times * 50, 2000);
return delay;
},
// THIS IS CRITICAL: enable lazy connect so the app starts
// even if Redis is down
lazyConnect: true,
enableReadyCheck: true,
});
// The naive approach everyone starts with
async function getUserNaive(id: string): Promise<User | null> {
const cached = await redis.get(`user:${id}`);
if (cached) return JSON.parse(cached);
const { rows } = await db.query(
'SELECT * FROM users WHERE id = $1', [id]
);
if (rows[0]) {
await redis.set(`user:${id}`, JSON.stringify(rows[0]), 'EX', 3600);
}
return rows[0] || null;
}
// What is wrong with the naive approach:
// 1. No protection against cache stampede
// 2. No circuit breaker when Redis is down
// 3. No serialization error handling
// 4. No cache warming
// 5. Stale data served indefinitely if invalidation fails
// The production version:
interface CacheOptions {
ttlSeconds: number;
staleWhileRevalidateSeconds?: number;
lockTimeoutMs?: number;
}
class CacheManager {
private redis: Redis;
private localCache: Map<string, { value: string; expiry: number }> = new Map();
private readonly LOCAL_TTL_MS = 5000; // L1 cache: 5 seconds
private redisHealthy = true;
constructor(redis: Redis) {
this.redis = redis;
// Health monitoring
this.redis.on('error', () => { this.redisHealthy = false; });
this.redis.on('ready', () => { this.redisHealthy = true; });
// Clean expired local cache entries every 30 seconds
setInterval(() => this.cleanLocalCache(), 30000);
}
async get<T>(
key: string,
loader: () => Promise<T>,
options: CacheOptions
): Promise<T> {
// L1: Check local in-memory cache first (no network call)
const local = this.localCache.get(key);
if (local && local.expiry > Date.now()) {
return JSON.parse(local.value) as T;
}
// L2: Check Redis
if (this.redisHealthy) {
try {
const cached = await this.redis.get(key);
if (cached !== null) {
// Populate L1
this.localCache.set(key, {
value: cached,
expiry: Date.now() + this.LOCAL_TTL_MS,
});
// Check if we should revalidate in background
if (options.staleWhileRevalidateSeconds) {
const ttl = await this.redis.ttl(key);
const staleThreshold =
options.ttlSeconds - options.staleWhileRevalidateSeconds;
if (ttl > 0 && ttl < staleThreshold) {
// Serve stale, refresh in background
this.revalidateInBackground(key, loader, options);
}
}
return JSON.parse(cached) as T;
}
} catch {
// Redis failed, fall through to loader
}
}
// Cache miss — load from source with stampede protection
return this.loadWithLock(key, loader, options);
}
private async loadWithLock<T>(
key: string,
loader: () => Promise<T>,
options: CacheOptions
): Promise<T> {
const lockKey = `lock:${key}`;
const lockTimeout = options.lockTimeoutMs || 5000;
if (this.redisHealthy) {
// Try to acquire lock (only one process loads from DB)
const acquired = await this.redis.set(
lockKey, '1', 'PX', lockTimeout, 'NX'
);
if (!acquired) {
// Another process is loading — wait and retry from cache
await new Promise(resolve => setTimeout(resolve, 100));
const cached = await this.redis.get(key);
if (cached) return JSON.parse(cached) as T;
// Still no cache — load anyway (lock holder might have failed)
}
}
try {
const value = await loader();
const serialized = JSON.stringify(value);
// Populate both cache layers
this.localCache.set(key, {
value: serialized,
expiry: Date.now() + this.LOCAL_TTL_MS,
});
if (this.redisHealthy) {
await this.redis.set(key, serialized, 'EX', options.ttlSeconds);
}
return value;
} finally {
if (this.redisHealthy) {
await this.redis.del(lockKey).catch(() => {});
}
}
}
private async revalidateInBackground<T>(
key: string,
loader: () => Promise<T>,
options: CacheOptions
) {
// Fire and forget — do not await
this.loadWithLock(key, loader, options).catch((error) => {
console.error(JSON.stringify({
level: 'error',
msg: 'background_revalidation_failed',
key,
error: (error as Error).message,
}));
});
}
async invalidate(...keys: string[]) {
// Invalidate both layers
for (const key of keys) {
this.localCache.delete(key);
}
if (this.redisHealthy && keys.length > 0) {
await this.redis.del(...keys);
}
}
async invalidatePattern(pattern: string) {
// Invalidate local cache entries matching pattern
const regex = new RegExp(
'^' + pattern.replace(/\*/g, '.*') + '$'
);
for (const key of this.localCache.keys()) {
if (regex.test(key)) this.localCache.delete(key);
}
// Invalidate Redis entries
if (this.redisHealthy) {
let cursor = '0';
do {
const [nextCursor, keys] = await this.redis.scan(
cursor, 'MATCH', pattern, 'COUNT', 100
);
cursor = nextCursor;
if (keys.length > 0) {
await this.redis.del(...keys);
}
} while (cursor !== '0');
}
}
private cleanLocalCache() {
const now = Date.now();
for (const [key, entry] of this.localCache) {
if (entry.expiry < now) {
this.localCache.delete(key);
}
}
}
}
// Usage
const cache = new CacheManager(redis);
async function getUser(id: string): Promise<User | null> {
return cache.get(
`user:${id}`,
async () => {
const { rows } = await db.query(
'SELECT * FROM users WHERE id = $1', [id]
);
return rows[0] || null;
},
{
ttlSeconds: 3600,
staleWhileRevalidateSeconds: 300,
lockTimeoutMs: 5000,
}
);
}
// On user update, invalidate
async function updateUser(id: string, data: Partial<User>) {
await db.query(
'UPDATE users SET name = $2, email = $3 WHERE id = $1',
[id, data.name, data.email]
);
await cache.invalidate(`user:${id}`);
// Also invalidate any list caches that might include this user
await cache.invalidatePattern(`user-list:*`);
}Production applications need multiple cache layers. Each layer has different characteristics:
// The full caching stack, from fastest to slowest:
//
// L0: CDN Edge Cache — 50ms globally, 0 server load
// L1: In-Memory Cache — <1ms, per-process, small capacity
// L2: Redis Cache — 1-5ms, shared across processes, large capacity
// L3: Database — 5-50ms, persistent, source of truth
//
// Strategy: Popular data should "float up" to faster layers
interface MultiLayerCacheConfig {
key: string;
l1TtlMs: number; // In-memory TTL
l2TtlSeconds: number; // Redis TTL
cdnMaxAge?: number; // CDN cache-control max-age
cdnStaleWhileRevalidate?: number;
}
// HTTP response with appropriate cache headers for each layer
function setCacheHeaders(
res: Response,
config: { isPublic: boolean; maxAge: number; staleWhileRevalidate?: number }
) {
const directives: string[] = [];
if (config.isPublic) {
directives.push('public');
} else {
directives.push('private');
}
directives.push(`max-age=${config.maxAge}`);
if (config.staleWhileRevalidate) {
directives.push(`stale-while-revalidate=${config.staleWhileRevalidate}`);
}
res.setHeader('Cache-Control', directives.join(', '));
// Vary header prevents cache poisoning
res.setHeader('Vary', 'Accept-Encoding, Accept-Language');
}
// Example: Product listing page
// CDN: 60s max-age, 300s stale-while-revalidate
// Redis: 300s TTL
// DB: Fallback source of truth
async function getProducts(req: Request, res: Response) {
const page = Number(req.query.page) || 1;
const category = req.query.category as string;
const cacheKey = `products:${category}:page${page}`;
const products = await cache.get(
cacheKey,
async () => {
const { rows } = await db.query(
`SELECT id, name, price, image_url, rating
FROM products
WHERE category = $1 AND active = true
ORDER BY popularity DESC
LIMIT 20 OFFSET $2`,
[category, (page - 1) * 20]
);
return rows;
},
{ ttlSeconds: 300, staleWhileRevalidateSeconds: 60 }
);
// Tell CDN and browser how to cache this response
setCacheHeaders(res, {
isPublic: true, // no user-specific data
maxAge: 60, // CDN serves cached version for 60s
staleWhileRevalidate: 300, // serve stale for up to 300s while refreshing
});
res.json(products);
}
// Example: User profile (private, no CDN)
async function getUserProfile(req: Request, res: Response) {
const userId = req.user.id;
const profile = await cache.get(
`profile:${userId}`,
async () => {
const { rows } = await db.query(
`SELECT u.*, COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
WHERE u.id = $1
GROUP BY u.id`,
[userId]
);
return rows[0];
},
{ ttlSeconds: 600, staleWhileRevalidateSeconds: 120 }
);
setCacheHeaders(res, {
isPublic: false, // user-specific data — no CDN
maxAge: 0, // browser always revalidates
});
res.json(profile);
}Phil Karlton was right: cache invalidation is one of the two hard things in computer science. Here are the patterns that actually work in production:
// Pattern 1: Event-driven invalidation
// When data changes, publish an event, and cache subscribers handle it
import { EventEmitter } from 'events';
class CacheInvalidationBus {
private emitter = new EventEmitter();
// Register invalidation rules
constructor(private cache: CacheManager) {
// When a user changes, invalidate user cache and any derived caches
this.on('user:updated', async (userId: string) => {
await cache.invalidate(`user:${userId}`, `profile:${userId}`);
await cache.invalidatePattern(`user-list:*`);
await cache.invalidatePattern(`leaderboard:*`);
});
// When a post changes, invalidate the post and related feeds
this.on('post:updated', async (postId: string, authorId: string) => {
await cache.invalidate(`post:${postId}`);
await cache.invalidatePattern(`feed:*`);
await cache.invalidatePattern(`posts:author:${authorId}:*`);
});
// When a comment is added, invalidate the post's comment count
this.on('comment:created', async (postId: string) => {
await cache.invalidate(`post:${postId}`, `comments:${postId}`);
});
}
private on(event: string, handler: (...args: string[]) => Promise<void>) {
this.emitter.on(event, (...args: string[]) => {
handler(...args).catch(err => {
console.error(JSON.stringify({
level: 'error',
msg: 'cache_invalidation_failed',
event,
error: (err as Error).message,
}));
});
});
}
emit(event: string, ...args: string[]) {
this.emitter.emit(event, ...args);
}
}
const invalidation = new CacheInvalidationBus(cache);
// In your update handlers:
async function updateUserHandler(req: Request, res: Response) {
const userId = req.params.id;
await db.query(
'UPDATE users SET name = $2 WHERE id = $1',
[userId, req.body.name]
);
invalidation.emit('user:updated', userId);
res.json({ success: true });
}
// Pattern 2: Versioned cache keys
// Instead of invalidating, bump a version number
// This is simpler and avoids the "did invalidation succeed?" problem
class VersionedCache {
constructor(
private redis: Redis,
private cacheManager: CacheManager
) {}
private versionKey(entity: string, id: string): string {
return `v:${entity}:${id}`;
}
private cacheKey(entity: string, id: string, version: number): string {
return `${entity}:${id}:v${version}`;
}
async get<T>(
entity: string,
id: string,
loader: () => Promise<T>,
ttlSeconds: number
): Promise<T> {
// Get current version
const version = Number(await this.redis.get(this.versionKey(entity, id))) || 1;
const key = this.cacheKey(entity, id, version);
return this.cacheManager.get(key, loader, { ttlSeconds });
}
async invalidate(entity: string, id: string): Promise<void> {
// Just bump the version — old cache entries expire naturally
await this.redis.incr(this.versionKey(entity, id));
// No need to find and delete old entries!
}
}
const versionedCache = new VersionedCache(redis, cache);
// Usage — invalidation is now just an INCR, never fails
async function getUserVersioned(id: string): Promise<User | null> {
return versionedCache.get('user', id, async () => {
const { rows } = await db.query(
'SELECT * FROM users WHERE id = $1', [id]
);
return rows[0] || null;
}, 3600);
}
async function updateUserVersioned(id: string, data: Partial<User>) {
await db.query('UPDATE users SET name = $2 WHERE id = $1', [id, data.name]);
await versionedCache.invalidate('user', id);
}The moment one server is not enough, you need to distribute traffic across multiple servers. This sounds simple. It is not.
# /etc/nginx/nginx.conf
# Production load balancer configuration
# Worker processes = CPU cores
worker_processes auto;
worker_rlimit_nofile 65535;
events {
worker_connections 65535;
multi_accept on;
use epoll;
}
http {
# Basic settings
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
keepalive_requests 1000;
# Buffer sizes
client_body_buffer_size 16k;
client_header_buffer_size 1k;
client_max_body_size 50m;
large_client_header_buffers 4 8k;
# Gzip compression
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 5; # 5 is the sweet spot (6+ uses too much CPU)
gzip_min_length 256;
gzip_types
text/plain
text/css
text/javascript
application/json
application/javascript
application/xml
image/svg+xml;
# Rate limiting
limit_req_zone $binary_remote_addr zone=api:10m rate=30r/s;
limit_req_zone $binary_remote_addr zone=login:10m rate=5r/m;
# Connection limiting
limit_conn_zone $binary_remote_addr zone=addr:10m;
# Upstream (your app servers)
upstream app_servers {
# Least connections — best for varying response times
least_conn;
# App servers with health checks
server 10.0.1.10:3000 max_fails=3 fail_timeout=30s;
server 10.0.1.11:3000 max_fails=3 fail_timeout=30s;
server 10.0.1.12:3000 max_fails=3 fail_timeout=30s;
# Keep connections alive to app servers
keepalive 32;
}
# WebSocket upstream (if needed)
upstream ws_servers {
# IP hash for sticky sessions (WebSocket requires it)
ip_hash;
server 10.0.1.10:3001;
server 10.0.1.11:3001;
server 10.0.1.12:3001;
}
server {
listen 80;
listen 443 ssl http2;
server_name example.com;
# SSL (use certbot to generate)
ssl_certificate /etc/ssl/certs/example.com.pem;
ssl_certificate_key /etc/ssl/private/example.com.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers on;
# Security headers
add_header X-Frame-Options DENY always;
add_header X-Content-Type-Options nosniff always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Strict-Transport-Security "max-age=63072000" always;
# Static assets — served directly by Nginx
location /_next/static/ {
alias /var/www/app/.next/static/;
expires 365d;
add_header Cache-Control "public, immutable";
}
location /static/ {
alias /var/www/app/public/static/;
expires 30d;
add_header Cache-Control "public";
}
# API routes with rate limiting
location /api/ {
limit_req zone=api burst=50 nodelay;
limit_conn addr 30;
proxy_pass http://app_servers;
proxy_http_version 1.1;
proxy_set_header Connection ""; # Enable keepalive
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeouts
proxy_connect_timeout 5s;
proxy_send_timeout 30s;
proxy_read_timeout 30s;
}
# Stricter rate limit for auth endpoints
location /api/auth/ {
limit_req zone=login burst=3 nodelay;
proxy_pass http://app_servers;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
# WebSocket
location /ws/ {
proxy_pass http://ws_servers;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_read_timeout 3600s; # 1 hour for WebSocket
proxy_send_timeout 3600s;
}
# Health check endpoint (no rate limit, no logging)
location /api/health {
access_log off;
proxy_pass http://app_servers;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
# Everything else
location / {
proxy_pass http://app_servers;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
}When you deploy a new version, the old process needs to stop accepting new requests, finish processing in-flight requests, and then exit. If you just kill the process, every in-flight request returns a 502 error. At 1000 requests per second, a 30-second deploy means 30,000 failed requests.
// Graceful shutdown handler — use this in EVERY production Node.js app
import http from 'http';
interface ShutdownConfig {
server: http.Server;
db: DatabasePool;
redis: Redis;
gracePeriodMs: number; // How long to wait for in-flight requests
}
function setupGracefulShutdown(config: ShutdownConfig) {
let isShuttingDown = false;
let activeRequests = 0;
// Track active requests
config.server.on('request', (_req, res) => {
if (isShuttingDown) {
res.writeHead(503, { 'Retry-After': '30' });
res.end('Server is shutting down');
return;
}
activeRequests++;
res.on('finish', () => { activeRequests--; });
});
async function shutdown(signal: string) {
if (isShuttingDown) return;
isShuttingDown = true;
console.log(JSON.stringify({
level: 'info',
msg: 'shutdown_initiated',
signal,
activeRequests,
}));
// Step 1: Stop accepting new connections
config.server.close();
// Step 2: Wait for in-flight requests to complete
const deadline = Date.now() + config.gracePeriodMs;
while (activeRequests > 0 && Date.now() < deadline) {
await new Promise(resolve => setTimeout(resolve, 100));
}
if (activeRequests > 0) {
console.log(JSON.stringify({
level: 'warn',
msg: 'forced_shutdown',
remainingRequests: activeRequests,
}));
}
// Step 3: Close database connections
try {
await config.db.shutdown();
} catch (error) {
console.error('DB shutdown error:', (error as Error).message);
}
// Step 4: Close Redis connections
try {
config.redis.disconnect();
} catch (error) {
console.error('Redis shutdown error:', (error as Error).message);
}
console.log(JSON.stringify({
level: 'info',
msg: 'shutdown_complete',
}));
process.exit(0);
}
process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));
// Prevent crashes from taking down the process without cleanup
process.on('uncaughtException', (error) => {
console.error(JSON.stringify({
level: 'fatal',
msg: 'uncaught_exception',
error: error.message,
stack: error.stack,
}));
shutdown('uncaughtException');
});
process.on('unhandledRejection', (reason) => {
console.error(JSON.stringify({
level: 'fatal',
msg: 'unhandled_rejection',
reason: String(reason),
}));
// Don't shutdown on unhandled rejections — log and continue
// But DO fix them. Every unhandled rejection is a bug.
});
}
// Usage
const server = app.listen(3000, () => {
console.log('Server started on port 3000');
});
setupGracefulShutdown({
server,
db,
redis,
gracePeriodMs: 30000, // 30 seconds to finish in-flight requests
});Health checks seem trivial. They are not. A bad health check is worse than no health check — it either cries wolf (healthy server removed from rotation) or hides problems (unhealthy server keeps receiving traffic).
// Health check endpoint with dependency checking
interface HealthStatus {
status: 'healthy' | 'degraded' | 'unhealthy';
checks: Record<string, {
status: 'pass' | 'fail';
latency_ms: number;
message?: string;
}>;
uptime_seconds: number;
version: string;
}
const startTime = Date.now();
async function healthCheck(): Promise<HealthStatus> {
const checks: HealthStatus['checks'] = {};
// Check database
const dbStart = performance.now();
try {
await db.query('SELECT 1');
checks.database = {
status: 'pass',
latency_ms: Math.round(performance.now() - dbStart),
};
} catch (error) {
checks.database = {
status: 'fail',
latency_ms: Math.round(performance.now() - dbStart),
message: (error as Error).message,
};
}
// Check Redis
const redisStart = performance.now();
try {
await redis.ping();
checks.redis = {
status: 'pass',
latency_ms: Math.round(performance.now() - redisStart),
};
} catch (error) {
checks.redis = {
status: 'fail',
latency_ms: Math.round(performance.now() - redisStart),
message: (error as Error).message,
};
}
// Check disk space (for apps that write to disk)
// Check memory usage
const memUsage = process.memoryUsage();
const heapUsedPct = (memUsage.heapUsed / memUsage.heapTotal) * 100;
checks.memory = {
status: heapUsedPct < 90 ? 'pass' : 'fail',
latency_ms: 0,
message: `Heap: ${Math.round(heapUsedPct)}% (${Math.round(memUsage.heapUsed / 1024 / 1024)}MB)`,
};
// Determine overall status
const allPassing = Object.values(checks).every(c => c.status === 'pass');
const anyFailing = Object.values(checks).some(c => c.status === 'fail');
// Database down = unhealthy (remove from rotation)
// Redis down = degraded (keep serving, but warn)
let status: HealthStatus['status'] = 'healthy';
if (checks.database?.status === 'fail' || checks.memory?.status === 'fail') {
status = 'unhealthy';
} else if (!allPassing) {
status = 'degraded';
}
return {
status,
checks,
uptime_seconds: Math.round((Date.now() - startTime) / 1000),
version: process.env.APP_VERSION || 'unknown',
};
}
// Endpoint
app.get('/api/health', async (_req, res) => {
const health = await healthCheck();
const statusCode = health.status === 'unhealthy' ? 503 : 200;
res.status(statusCode).json(health);
});
// Lightweight liveness probe (for Kubernetes)
// This should ALWAYS return 200 unless the process is completely broken
app.get('/api/health/live', (_req, res) => {
res.status(200).json({ status: 'alive' });
});
// Readiness probe (for Kubernetes)
// Returns 503 during startup and shutdown
app.get('/api/health/ready', async (_req, res) => {
if (isShuttingDown) {
return res.status(503).json({ status: 'shutting_down' });
}
// Check critical dependencies
try {
await db.query('SELECT 1');
res.status(200).json({ status: 'ready' });
} catch {
res.status(503).json({ status: 'not_ready' });
}
});Here is a truth that took me too long to learn: the fastest way to make your API fast is to stop doing work in the request handler. Send the response immediately, do the work later.
Sending a welcome email? That takes 2 seconds. Processing an uploaded image? That takes 10 seconds. Generating a PDF report? That takes 30 seconds. None of these need to happen before you send the HTTP response.
// Production job queue with BullMQ
// This is the pattern I use for every application past 1K users
import { Queue, Worker, QueueScheduler, Job } from 'bullmq';
import { Redis } from 'ioredis';
// Shared Redis connection for all queues
const connection = new Redis({
host: process.env.REDIS_HOST,
port: Number(process.env.REDIS_PORT) || 6379,
password: process.env.REDIS_PASSWORD,
maxRetriesPerRequest: null, // Required by BullMQ
});
// Define job types
interface JobTypes {
'email:welcome': { userId: string; email: string };
'email:password-reset': { userId: string; token: string };
'image:resize': { imageId: string; sizes: number[] };
'report:generate': { userId: string; reportType: string; dateRange: [string, string] };
'analytics:process': { eventBatch: Array<{ type: string; data: unknown }> };
'webhook:deliver': { url: string; payload: unknown; attemptNumber: number };
}
type JobName = keyof JobTypes;
// Queue factory
function createQueue(name: string) {
return new Queue(name, {
connection,
defaultJobOptions: {
removeOnComplete: {
age: 24 * 3600, // Keep completed jobs for 24 hours
count: 1000, // Keep last 1000 completed jobs
},
removeOnFail: {
age: 7 * 24 * 3600, // Keep failed jobs for 7 days
},
attempts: 3,
backoff: {
type: 'exponential',
delay: 1000, // 1s, 2s, 4s
},
},
});
}
// Create queues with different priorities
const emailQueue = createQueue('email');
const imageQueue = createQueue('image-processing');
const reportQueue = createQueue('report-generation');
const analyticsQueue = createQueue('analytics');
const webhookQueue = createQueue('webhook-delivery');
// Type-safe job enqueue
async function enqueueJob<T extends JobName>(
name: T,
data: JobTypes[T],
options?: { priority?: number; delay?: number; deduplicationId?: string }
): Promise<string> {
const queueMap: Record<string, Queue> = {
'email:welcome': emailQueue,
'email:password-reset': emailQueue,
'image:resize': imageQueue,
'report:generate': reportQueue,
'analytics:process': analyticsQueue,
'webhook:deliver': webhookQueue,
};
const queue = queueMap[name];
if (!queue) throw new Error(`No queue for job type: ${name}`);
// Deduplication — prevent the same job from being enqueued twice
const jobId = options?.deduplicationId || undefined;
const job = await queue.add(name, data, {
priority: options?.priority || 0,
delay: options?.delay || 0,
jobId,
});
console.log(JSON.stringify({
level: 'info',
msg: 'job_enqueued',
queue: queue.name,
jobName: name,
jobId: job.id,
}));
return job.id!;
}
// Example: Webhook delivery with exponential backoff and dead letter queue
const webhookWorker = new Worker(
'webhook-delivery',
async (job: Job) => {
const { url, payload, attemptNumber } = job.data;
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 10000); // 10s timeout
try {
const response = await fetch(url, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-Webhook-Signature': computeHmac(JSON.stringify(payload)),
'X-Webhook-Attempt': String(attemptNumber),
},
body: JSON.stringify(payload),
signal: controller.signal,
});
if (!response.ok) {
throw new Error(`Webhook returned ${response.status}: ${response.statusText}`);
}
console.log(JSON.stringify({
level: 'info',
msg: 'webhook_delivered',
url,
attempt: attemptNumber,
status: response.status,
}));
} finally {
clearTimeout(timeout);
}
},
{
connection,
concurrency: 10, // Process 10 webhooks in parallel
limiter: {
max: 100, // Max 100 jobs per...
duration: 1000, // ...second (rate limiting)
},
}
);
// Handle failed jobs — move to dead letter queue after all retries exhausted
webhookWorker.on('failed', async (job, error) => {
if (job && job.attemptsMade >= (job.opts.attempts || 3)) {
console.error(JSON.stringify({
level: 'error',
msg: 'webhook_dead_letter',
jobId: job.id,
url: job.data.url,
attempts: job.attemptsMade,
error: error.message,
}));
// Move to dead letter queue for manual review
await enqueueJob('webhook:deliver', {
...job.data,
attemptNumber: job.attemptsMade,
}, { priority: 10 }); // low priority in DLQ
}
});
function computeHmac(payload: string): string {
const crypto = require('crypto');
return crypto
.createHmac('sha256', process.env.WEBHOOK_SECRET!)
.update(payload)
.digest('hex');
}
// Example API handler that returns immediately
app.post('/api/reports', async (req, res) => {
const { reportType, startDate, endDate } = req.body;
const userId = req.user.id;
// Enqueue the job — takes < 5ms
const jobId = await enqueueJob('report:generate', {
userId,
reportType,
dateRange: [startDate, endDate],
});
// Return immediately — the report generates in the background
res.status(202).json({
message: 'Report generation started',
jobId,
statusUrl: `/api/reports/status/${jobId}`,
});
});
// Status endpoint for long-running jobs
app.get('/api/reports/status/:jobId', async (req, res) => {
const job = await reportQueue.getJob(req.params.jobId);
if (!job) {
return res.status(404).json({ error: 'Job not found' });
}
const state = await job.getState();
const progress = job.progress;
res.json({
jobId: job.id,
state, // 'waiting' | 'active' | 'completed' | 'failed'
progress, // 0-100
result: state === 'completed' ? job.returnvalue : undefined,
failedReason: state === 'failed' ? job.failedReason : undefined,
});
});In a distributed system, messages will be delivered more than once. Network timeouts, worker crashes, retry logic — all of these can cause a job to be processed multiple times. Your system must handle this gracefully.
// Idempotent job processing with deduplication
class IdempotencyGuard {
constructor(private redis: Redis) {}
// Check if a job has already been processed
async isProcessed(idempotencyKey: string): Promise<boolean> {
const exists = await this.redis.exists(`idempotent:${idempotencyKey}`);
return exists === 1;
}
// Mark a job as processing (with TTL to prevent permanent locks)
async markProcessing(idempotencyKey: string): Promise<boolean> {
// NX = only set if not exists, EX = expire after 300 seconds
const result = await this.redis.set(
`idempotent:${idempotencyKey}`,
'processing',
'EX', 300,
'NX'
);
return result === 'OK';
}
// Mark a job as completed
async markCompleted(idempotencyKey: string, ttlSeconds = 86400): Promise<void> {
await this.redis.set(
`idempotent:${idempotencyKey}`,
'completed',
'EX', ttlSeconds
);
}
}
const idempotency = new IdempotencyGuard(redis);
// Usage in a payment processing worker
async function processPayment(job: Job) {
const { paymentId, userId, amount } = job.data;
const idempotencyKey = `payment:${paymentId}`;
// Check if already processed
if (await idempotency.isProcessed(idempotencyKey)) {
console.log(JSON.stringify({
level: 'info',
msg: 'duplicate_payment_skipped',
paymentId,
}));
return; // Already processed, skip
}
// Try to acquire processing lock
const acquired = await idempotency.markProcessing(idempotencyKey);
if (!acquired) {
// Another worker is processing this — skip
return;
}
try {
// Process the payment
await db.transaction(async (client) => {
// Debit the user's account
const result = await client.query(
`UPDATE accounts
SET balance = balance - $1
WHERE user_id = $2 AND balance >= $1
RETURNING balance`,
[amount, userId]
);
if (result.rowCount === 0) {
throw new Error('Insufficient balance');
}
// Record the transaction
await client.query(
`INSERT INTO transactions (payment_id, user_id, amount, status, created_at)
VALUES ($1, $2, $3, 'completed', now())
ON CONFLICT (payment_id) DO NOTHING`, // Extra safety: DB-level idempotency
[paymentId, userId, amount]
);
});
await idempotency.markCompleted(idempotencyKey);
console.log(JSON.stringify({
level: 'info',
msg: 'payment_processed',
paymentId,
amount,
userId,
}));
} catch (error) {
// Clear the processing lock so it can be retried
await redis.del(`idempotent:${idempotencyKey}`);
throw error;
}
}A CDN is not just "put Cloudflare in front of your site." At scale, your CDN strategy determines whether your users get a 50ms response or a 500ms response, and whether your origin servers handle 1000 requests per second or 100,000.
// Build-time asset optimization pipeline
// This runs during your CI/CD build step
import sharp from 'sharp';
import { createHash } from 'crypto';
import { readFile, writeFile, readdir, mkdir } from 'fs/promises';
import { join, extname, basename } from 'path';
interface OptimizedAsset {
originalPath: string;
optimizedPath: string;
originalSize: number;
optimizedSize: number;
hash: string;
format: string;
}
class AssetPipeline {
private manifest: Map<string, OptimizedAsset> = new Map();
constructor(
private inputDir: string,
private outputDir: string
) {}
async processImages(): Promise<void> {
const files = await readdir(this.inputDir, { recursive: true });
const imageFiles = files.filter(f =>
/\.(png|jpg|jpeg|webp|gif)$/i.test(String(f))
);
for (const file of imageFiles) {
const filePath = join(this.inputDir, String(file));
await this.optimizeImage(filePath);
}
// Write manifest for the application to use
const manifestObj = Object.fromEntries(this.manifest);
await writeFile(
join(this.outputDir, 'asset-manifest.json'),
JSON.stringify(manifestObj, null, 2)
);
}
private async optimizeImage(inputPath: string): Promise<void> {
const buffer = await readFile(inputPath);
const hash = createHash('md5').update(buffer).digest('hex').substring(0, 8);
const name = basename(inputPath, extname(inputPath));
// Generate multiple formats
const formats = [
{ ext: 'webp', processor: sharp(buffer).webp({ quality: 80 }) },
{ ext: 'avif', processor: sharp(buffer).avif({ quality: 65 }) },
];
// Generate multiple sizes for responsive images
const sizes = [320, 640, 960, 1280, 1920];
const metadata = await sharp(buffer).metadata();
const originalWidth = metadata.width || 1920;
for (const format of formats) {
for (const width of sizes) {
if (width > originalWidth) continue;
const outputName = `${name}-${width}w-${hash}.${format.ext}`;
const outputPath = join(this.outputDir, 'images', outputName);
await mkdir(join(this.outputDir, 'images'), { recursive: true });
const result = await sharp(buffer)
.resize(width)
[format.ext === 'webp' ? 'webp' : 'avif'](
format.ext === 'webp' ? { quality: 80 } : { quality: 65 }
)
.toFile(outputPath);
this.manifest.set(`${name}-${width}.${format.ext}`, {
originalPath: inputPath,
optimizedPath: outputPath,
originalSize: buffer.length,
optimizedSize: result.size,
hash,
format: format.ext,
});
}
}
}
}
// Responsive image component that uses the optimized assets
// This is the server component that generates the HTML
function OptimizedImage({
src,
alt,
sizes = '100vw',
className,
priority = false,
}: {
src: string;
alt: string;
sizes?: string;
className?: string;
priority?: boolean;
}) {
const baseName = src.replace(/\.[^.]+$/, '');
const widths = [320, 640, 960, 1280, 1920];
const avifSrcSet = widths
.map(w => `/_next/static/images/${baseName}-${w}w.avif ${w}w`)
.join(', ');
const webpSrcSet = widths
.map(w => `/_next/static/images/${baseName}-${w}w.webp ${w}w`)
.join(', ');
return `
<picture>
<source type="image/avif" srcSet="${avifSrcSet}" sizes="${sizes}" />
<source type="image/webp" srcSet="${webpSrcSet}" sizes="${sizes}" />
<img
src="/_next/static/images/${baseName}-960w.webp"
alt="${alt}"
class="${className || ''}"
loading="${priority ? 'eager' : 'lazy'}"
decoding="async"
${priority ? 'fetchpriority="high"' : ''}
/>
</picture>
`;
}The real power of a CDN is not caching static files — it is caching dynamic responses. But you need a way to invalidate specific cached responses when the underlying data changes. Surrogate keys solve this.
// Surrogate key-based cache invalidation
// This works with CDNs that support surrogate keys (most enterprise CDNs)
class EdgeCacheManager {
private surrogateKeys: Set<string> = new Set();
// Tag a response with surrogate keys
addKey(key: string): void {
this.surrogateKeys.add(key);
}
// Get the header value to send with the response
getHeader(): string {
return Array.from(this.surrogateKeys).join(' ');
}
// Purge all responses tagged with a specific key
static async purge(keys: string[]): Promise<void> {
// This calls your CDN's purge API
// Example for a generic CDN API:
try {
await fetch(process.env.CDN_PURGE_URL!, {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.CDN_API_TOKEN}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
surrogate_keys: keys,
}),
});
console.log(JSON.stringify({
level: 'info',
msg: 'cdn_purge',
keys,
}));
} catch (error) {
console.error(JSON.stringify({
level: 'error',
msg: 'cdn_purge_failed',
keys,
error: (error as Error).message,
}));
}
}
}
// Middleware that sets cache headers and surrogate keys
function edgeCacheMiddleware(req: Request, res: Response, next: NextFunction) {
const edgeCache = new EdgeCacheManager();
(res as any).edgeCache = edgeCache;
// Override res.json to add cache headers
const originalJson = res.json.bind(res);
res.json = (body: unknown) => {
const keys = edgeCache.getHeader();
if (keys) {
res.setHeader('Surrogate-Key', keys);
}
return originalJson(body);
};
next();
}
// Usage in route handlers
app.get('/api/products/:id', edgeCacheMiddleware, async (req, res) => {
const product = await getProduct(req.params.id);
// Tag this response so we can purge it later
(res as any).edgeCache.addKey(`product:${req.params.id}`);
(res as any).edgeCache.addKey(`products`);
(res as any).edgeCache.addKey(`category:${product.category}`);
res.setHeader('Cache-Control', 'public, max-age=60, s-maxage=3600');
res.json(product);
});
// When a product is updated, purge its cached responses
app.put('/api/products/:id', async (req, res) => {
await updateProduct(req.params.id, req.body);
// Purge CDN cache for this product and its category
const product = await getProduct(req.params.id);
await EdgeCacheManager.purge([
`product:${req.params.id}`,
`category:${product.category}`,
'products', // Also purge any product listing pages
]);
res.json({ success: true });
});You cannot fix what you cannot see. At 100 users, console.log is fine. At 100K users, you need structured logging, distributed tracing, and an alerting strategy that wakes you up for real problems and lets you sleep through noise.
// Production logging that you can actually query and alert on
enum LogLevel {
DEBUG = 'debug',
INFO = 'info',
WARN = 'warn',
ERROR = 'error',
FATAL = 'fatal',
}
interface LogEntry {
timestamp: string;
level: LogLevel;
msg: string;
service: string;
traceId?: string;
spanId?: string;
userId?: string;
[key: string]: unknown;
}
class StructuredLogger {
private service: string;
private defaultFields: Record<string, unknown>;
constructor(service: string, defaultFields: Record<string, unknown> = {}) {
this.service = service;
this.defaultFields = defaultFields;
}
private log(level: LogLevel, msg: string, fields: Record<string, unknown> = {}) {
const entry: LogEntry = {
timestamp: new Date().toISOString(),
level,
msg,
service: this.service,
...this.defaultFields,
...fields,
};
// Mask PII in log entries
const masked = this.maskPII(entry);
// Write to stdout — let the log aggregator handle routing
process.stdout.write(JSON.stringify(masked) + '\n');
}
debug(msg: string, fields?: Record<string, unknown>) { this.log(LogLevel.DEBUG, msg, fields); }
info(msg: string, fields?: Record<string, unknown>) { this.log(LogLevel.INFO, msg, fields); }
warn(msg: string, fields?: Record<string, unknown>) { this.log(LogLevel.WARN, msg, fields); }
error(msg: string, fields?: Record<string, unknown>) { this.log(LogLevel.ERROR, msg, fields); }
fatal(msg: string, fields?: Record<string, unknown>) { this.log(LogLevel.FATAL, msg, fields); }
// Create a child logger with additional default fields
child(fields: Record<string, unknown>): StructuredLogger {
const child = new StructuredLogger(this.service, {
...this.defaultFields,
...fields,
});
return child;
}
private maskPII(entry: LogEntry): LogEntry {
const masked = { ...entry };
// Mask email addresses
if (typeof masked.email === 'string') {
const [name, domain] = masked.email.split('@');
masked.email = `${name[0]}***@${domain}`;
}
// Mask IP addresses (keep first two octets for geo debugging)
if (typeof masked.ip === 'string') {
const parts = masked.ip.split('.');
masked.ip = `${parts[0]}.${parts[1]}.x.x`;
}
// Remove any field named password, secret, token, etc.
for (const key of Object.keys(masked)) {
if (/password|secret|token|authorization|cookie/i.test(key)) {
masked[key] = '[REDACTED]';
}
}
return masked;
}
}
// Create logger per module
const logger = new StructuredLogger('api-server');
// Request logging middleware
function requestLogger(req: Request, res: Response, next: NextFunction) {
const start = performance.now();
const traceId = req.headers['x-trace-id'] as string || generateTraceId();
// Create a request-scoped logger
const reqLogger = logger.child({
traceId,
method: req.method,
path: req.path,
userAgent: req.headers['user-agent'],
ip: req.ip,
});
// Attach to request for use in handlers
(req as any).logger = reqLogger;
(req as any).traceId = traceId;
// Set trace ID header on response
res.setHeader('X-Trace-Id', traceId);
// Log when response is sent
res.on('finish', () => {
const duration = performance.now() - start;
const level = res.statusCode >= 500 ? 'error'
: res.statusCode >= 400 ? 'warn'
: 'info';
reqLogger[level]('request_completed', {
statusCode: res.statusCode,
duration_ms: Math.round(duration),
contentLength: res.getHeader('content-length'),
});
});
next();
}
function generateTraceId(): string {
return Array.from(crypto.getRandomValues(new Uint8Array(16)))
.map(b => b.toString(16).padStart(2, '0'))
.join('');
}The RED method is the simplest framework for service monitoring that actually works. For every service, measure:
// RED metrics collection
class RedMetrics {
private requests: number[] = []; // timestamps of requests
private errors: number[] = []; // timestamps of errors
private durations: number[] = []; // response times in ms
private readonly windowMs = 60000; // 1-minute sliding window
private readonly maxSamples = 10000;
recordRequest(duration: number, isError: boolean) {
const now = Date.now();
this.requests.push(now);
this.durations.push(duration);
if (isError) this.errors.push(now);
// Trim old data
if (this.requests.length > this.maxSamples) {
this.cleanup();
}
}
getMetrics(): {
rate: number;
errorRate: number;
p50: number;
p95: number;
p99: number;
} {
this.cleanup();
const windowStart = Date.now() - this.windowMs;
const recentRequests = this.requests.filter(t => t >= windowStart);
const recentErrors = this.errors.filter(t => t >= windowStart);
const recentDurations = [...this.durations].sort((a, b) => a - b);
const rate = recentRequests.length / (this.windowMs / 1000);
const errorRate = recentRequests.length > 0
? (recentErrors.length / recentRequests.length) * 100
: 0;
return {
rate: Math.round(rate * 100) / 100,
errorRate: Math.round(errorRate * 100) / 100,
p50: this.percentile(recentDurations, 50),
p95: this.percentile(recentDurations, 95),
p99: this.percentile(recentDurations, 99),
};
}
private percentile(sorted: number[], pct: number): number {
if (sorted.length === 0) return 0;
const index = Math.ceil((pct / 100) * sorted.length) - 1;
return Math.round(sorted[Math.max(0, index)]);
}
private cleanup() {
const cutoff = Date.now() - this.windowMs * 5; // Keep 5 minutes of data
this.requests = this.requests.filter(t => t >= cutoff);
this.errors = this.errors.filter(t => t >= cutoff);
// Only keep recent durations
if (this.durations.length > this.maxSamples) {
this.durations = this.durations.slice(-this.maxSamples);
}
}
}
// Per-route metrics
const routeMetrics = new Map<string, RedMetrics>();
function getRouteMetrics(route: string): RedMetrics {
if (!routeMetrics.has(route)) {
routeMetrics.set(route, new RedMetrics());
}
return routeMetrics.get(route)!;
}
// Metrics middleware
function metricsMiddleware(req: Request, res: Response, next: NextFunction) {
const start = performance.now();
res.on('finish', () => {
const duration = performance.now() - start;
const route = req.route?.path || req.path;
const isError = res.statusCode >= 500;
getRouteMetrics(route).recordRequest(duration, isError);
getRouteMetrics('_global').recordRequest(duration, isError);
});
next();
}
// Metrics endpoint (scrape this with your monitoring tool)
app.get('/api/metrics', (_req, res) => {
const metrics: Record<string, ReturnType<RedMetrics['getMetrics']>> = {};
for (const [route, m] of routeMetrics) {
metrics[route] = m.getMetrics();
}
res.json({
timestamp: new Date().toISOString(),
metrics,
process: {
uptime: process.uptime(),
memory: process.memoryUsage(),
cpu: process.cpuUsage(),
},
});
});Service Level Objectives tell you whether your system is healthy in terms that users care about. Not "CPU is at 40%," but "99.9% of requests complete in under 500ms."
// SLO monitoring and error budget tracking
interface SLO {
name: string;
target: number; // e.g., 0.999 = 99.9%
window: 'rolling_7d' | 'rolling_30d' | 'calendar_month';
indicator: SLI;
}
interface SLI {
type: 'availability' | 'latency' | 'quality';
goodEventFn: (req: RequestLog) => boolean;
validEventFn: (req: RequestLog) => boolean;
}
interface RequestLog {
statusCode: number;
duration: number;
path: string;
}
class SLOTracker {
private logs: RequestLog[] = [];
private readonly maxLogs = 1_000_000;
constructor(private slos: SLO[]) {}
record(log: RequestLog) {
this.logs.push(log);
if (this.logs.length > this.maxLogs) {
// Keep last 7 days
const cutoff = Date.now() - 7 * 24 * 60 * 60 * 1000;
this.logs = this.logs.slice(-Math.floor(this.maxLogs / 2));
}
}
getStatus(): Array<{
name: string;
target: number;
current: number;
errorBudgetRemaining: number;
status: 'ok' | 'warning' | 'critical';
}> {
return this.slos.map(slo => {
const validEvents = this.logs.filter(log =>
slo.indicator.validEventFn(log)
);
const goodEvents = validEvents.filter(log =>
slo.indicator.goodEventFn(log)
);
const current = validEvents.length > 0
? goodEvents.length / validEvents.length
: 1;
// Error budget: how many more failures can we tolerate?
const totalBudget = (1 - slo.target) * validEvents.length;
const usedBudget = validEvents.length - goodEvents.length;
const errorBudgetRemaining = totalBudget > 0
? Math.max(0, (totalBudget - usedBudget) / totalBudget)
: 1;
let status: 'ok' | 'warning' | 'critical' = 'ok';
if (errorBudgetRemaining < 0.1) status = 'critical';
else if (errorBudgetRemaining < 0.3) status = 'warning';
return {
name: slo.name,
target: slo.target,
current: Math.round(current * 10000) / 10000,
errorBudgetRemaining: Math.round(errorBudgetRemaining * 1000) / 1000,
status,
};
});
}
}
// Define your SLOs
const sloTracker = new SLOTracker([
{
name: 'API Availability',
target: 0.999, // 99.9%
window: 'rolling_30d',
indicator: {
type: 'availability',
validEventFn: (req) => !req.path.startsWith('/api/health'),
goodEventFn: (req) => req.statusCode < 500,
},
},
{
name: 'API Latency (p99 < 500ms)',
target: 0.99, // 99%
window: 'rolling_7d',
indicator: {
type: 'latency',
validEventFn: (req) => req.path.startsWith('/api/'),
goodEventFn: (req) => req.duration < 500,
},
},
{
name: 'Checkout Success Rate',
target: 0.995, // 99.5%
window: 'rolling_7d',
indicator: {
type: 'quality',
validEventFn: (req) => req.path === '/api/checkout',
goodEventFn: (req) => req.statusCode < 400,
},
},
]);
// Alert when error budget is running low
setInterval(() => {
const statuses = sloTracker.getStatus();
for (const slo of statuses) {
if (slo.status === 'critical') {
console.error(JSON.stringify({
level: 'error',
msg: 'slo_budget_critical',
slo: slo.name,
current: slo.current,
target: slo.target,
budgetRemaining: slo.errorBudgetRemaining,
}));
// Trigger PagerDuty/Opsgenie/whatever
} else if (slo.status === 'warning') {
console.warn(JSON.stringify({
level: 'warn',
msg: 'slo_budget_warning',
slo: slo.name,
current: slo.current,
target: slo.target,
budgetRemaining: slo.errorBudgetRemaining,
}));
}
}
}, 60000); // Check every minuteThese are real incidents. Names changed, numbers approximate, lessons permanent.
A SaaS application had a dashboard that showed "recent activity." The query was:
-- This query ran on every dashboard page load
-- There was no index on (organization_id, created_at)
SELECT *
FROM activity_log
WHERE organization_id = $1
ORDER BY created_at DESC
LIMIT 50;At 10K rows, it returned in 5ms. At 10M rows, it took 8 seconds. But it did not fail immediately — it degraded slowly. By the time anyone noticed, the database CPU was at 100%, other queries were timing out, and the site was essentially down for 4 hours during US business hours.
The fix was a single line:
CREATE INDEX CONCURRENTLY idx_activity_org_created
ON activity_log (organization_id, created_at DESC);Query time: 8 seconds to 2ms. The CONCURRENTLY keyword is important — without it, the CREATE INDEX statement locks the table and blocks all writes for the duration of the index build. On a 10M-row table, that could be 10+ minutes of downtime.
Lesson: Run EXPLAIN ANALYZE on every query that touches a table with more than 10K rows. Set up slow query logging from day one. Do not wait until it hurts.
A Node.js application created a new database connection for every request. At 50 requests per second, this was 50 connections opened and closed every second. Each connection takes ~50ms to establish (TCP handshake, SSL negotiation, authentication). That is 2.5 seconds of connection overhead per second of wall time.
But worse: under load, the application would open more connections than PostgreSQL could handle. Once you hit max_connections, new connection attempts fail, which triggers retries, which creates more connection attempts, which creates a cascading failure.
// WHAT THE CODE LOOKED LIKE (this is bad, do not do this)
import { Client } from 'pg';
app.get('/api/data', async (req, res) => {
// New connection per request — this is the bug
const client = new Client({ connectionString: process.env.DATABASE_URL });
await client.connect();
try {
const result = await client.query('SELECT * FROM data WHERE id = $1', [req.params.id]);
res.json(result.rows[0]);
} finally {
await client.end(); // Connection destroyed after every request
}
});
// THE FIX: Use a Pool (this is what you should do)
import { Pool } from 'pg';
// One pool, created at startup, shared across all requests
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 20, // Maximum 20 connections to the database
idleTimeoutMillis: 30000,
});
app.get('/api/data', async (req, res) => {
// Pool manages connection lifecycle
const result = await pool.query('SELECT * FROM data WHERE id = $1', [req.params.id]);
res.json(result.rows[0]);
});An e-commerce site cached its product catalog in Redis with a 5-minute TTL. When the cache expired, the next request would hit the database and repopulate the cache. This worked perfectly at low traffic.
At 10K requests per second, here is what happened when the cache expired:
The fix is the stampede protection pattern shown earlier in this article — use a distributed lock so only one process loads from the database, and everyone else waits for the result. Here it is distilled:
// The stampede protection pattern in its simplest form
async function getWithStampedeProtection<T>(
key: string,
loader: () => Promise<T>,
ttlSeconds: number
): Promise<T> {
// Check cache first
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);
// Try to acquire lock
const lockKey = `lock:${key}`;
const locked = await redis.set(lockKey, '1', 'EX', 30, 'NX');
if (locked) {
// We got the lock — load from source
try {
const value = await loader();
await redis.set(key, JSON.stringify(value), 'EX', ttlSeconds);
return value;
} finally {
await redis.del(lockKey);
}
} else {
// Someone else is loading — wait and retry
await new Promise(r => setTimeout(r, 200));
const retryCache = await redis.get(key);
if (retryCache) return JSON.parse(retryCache);
// If still no cache, load anyway (lock holder might have failed)
return loader();
}
}Node.js applications have a special class of memory leak: event listener accumulation. Every time you add an event listener and forget to remove it, you leak a small amount of memory. At 10 requests per second, you might leak 1KB per second. You will not notice for days.
// THE MEMORY LEAK (extremely common in WebSocket/SSE code)
app.get('/api/stream', (req, res) => {
res.writeHead(200, { 'Content-Type': 'text/event-stream' });
// THIS IS THE LEAK
// Every new SSE connection adds a listener that is never removed
eventEmitter.on('update', (data) => {
res.write(`data: ${JSON.stringify(data)}\n\n`);
});
// When the client disconnects, the listener stays forever
// After 10K connections, you have 10K orphaned listeners
// Each one references the entire closure scope
});
// THE FIX: Always clean up listeners
app.get('/api/stream', (req, res) => {
res.writeHead(200, { 'Content-Type': 'text/event-stream' });
const handler = (data: unknown) => {
res.write(`data: ${JSON.stringify(data)}\n\n`);
};
eventEmitter.on('update', handler);
// Clean up when connection closes
req.on('close', () => {
eventEmitter.removeListener('update', handler);
});
// Also handle errors
req.on('error', () => {
eventEmitter.removeListener('update', handler);
});
});
// MONITORING: Track event listener counts
setInterval(() => {
const listenerCounts: Record<string, number> = {};
for (const event of eventEmitter.eventNames()) {
listenerCounts[String(event)] = eventEmitter.listenerCount(event);
}
// Alert if any event has too many listeners
for (const [event, count] of Object.entries(listenerCounts)) {
if (count > 100) {
console.error(JSON.stringify({
level: 'error',
msg: 'listener_leak_suspected',
event,
count,
}));
}
}
}, 30000);A microservices architecture where Service A calls Service B, which calls Service C. Each service has a 3-retry policy with no backoff.
Service C goes down. Service B retries 3 times per request. Service A retries 3 times per request. Each of Service A's retries triggers Service B, which retries 3 times.
Result: 1 original request turns into 1 * 3 * 3 = 9 requests to Service C. With 1000 requests per second to Service A, Service C receives 9000 requests per second — exactly when it is least able to handle them.
// The retry strategy that does not create storms
interface RetryConfig {
maxRetries: number;
baseDelayMs: number;
maxDelayMs: number;
jitterFactor: number; // 0 to 1
retryableStatuses: number[];
}
async function fetchWithRetry(
url: string,
options: RequestInit,
config: RetryConfig = {
maxRetries: 3,
baseDelayMs: 1000,
maxDelayMs: 30000,
jitterFactor: 0.5,
retryableStatuses: [429, 502, 503, 504],
}
): Promise<Response> {
let lastError: Error | null = null;
for (let attempt = 0; attempt <= config.maxRetries; attempt++) {
try {
const controller = new AbortController();
const timeout = setTimeout(
() => controller.abort(),
10000 // 10s timeout per attempt
);
const response = await fetch(url, {
...options,
signal: controller.signal,
});
clearTimeout(timeout);
// Check if we should retry based on status code
if (config.retryableStatuses.includes(response.status) && attempt < config.maxRetries) {
// Respect Retry-After header if present
const retryAfter = response.headers.get('Retry-After');
if (retryAfter) {
const waitMs = Number(retryAfter) * 1000;
await new Promise(r => setTimeout(r, Math.min(waitMs, config.maxDelayMs)));
continue;
}
await exponentialBackoff(attempt, config);
continue;
}
return response;
} catch (error) {
lastError = error as Error;
if (attempt < config.maxRetries) {
await exponentialBackoff(attempt, config);
}
}
}
throw lastError || new Error(`Failed after ${config.maxRetries} retries`);
}
function exponentialBackoff(attempt: number, config: RetryConfig): Promise<void> {
// Exponential backoff: 1s, 2s, 4s, 8s, ...
const exponentialDelay = config.baseDelayMs * Math.pow(2, attempt);
const cappedDelay = Math.min(exponentialDelay, config.maxDelayMs);
// Add jitter to prevent thundering herd
// Without jitter, all retries happen at the same time
const jitter = cappedDelay * config.jitterFactor * Math.random();
const delay = cappedDelay + jitter;
return new Promise(resolve => setTimeout(resolve, delay));
}
// Circuit breaker: stop calling a service that is clearly down
enum CircuitState {
CLOSED = 'closed', // Normal operation
OPEN = 'open', // Service is down, fail fast
HALF_OPEN = 'half_open', // Testing if service recovered
}
class CircuitBreaker {
private state: CircuitState = CircuitState.CLOSED;
private failures = 0;
private lastFailureTime = 0;
private successesInHalfOpen = 0;
constructor(
private readonly failureThreshold: number = 5,
private readonly resetTimeoutMs: number = 30000,
private readonly halfOpenSuccesses: number = 3,
) {}
async execute<T>(fn: () => Promise<T>): Promise<T> {
if (this.state === CircuitState.OPEN) {
// Check if enough time has passed to try again
if (Date.now() - this.lastFailureTime > this.resetTimeoutMs) {
this.state = CircuitState.HALF_OPEN;
this.successesInHalfOpen = 0;
} else {
throw new Error('Circuit breaker is open — service unavailable');
}
}
try {
const result = await fn();
if (this.state === CircuitState.HALF_OPEN) {
this.successesInHalfOpen++;
if (this.successesInHalfOpen >= this.halfOpenSuccesses) {
this.state = CircuitState.CLOSED;
this.failures = 0;
}
} else {
this.failures = 0;
}
return result;
} catch (error) {
this.failures++;
this.lastFailureTime = Date.now();
if (this.failures >= this.failureThreshold) {
this.state = CircuitState.OPEN;
console.error(JSON.stringify({
level: 'error',
msg: 'circuit_breaker_opened',
failures: this.failures,
}));
}
throw error;
}
}
getState(): CircuitState {
return this.state;
}
}
// Usage
const paymentServiceBreaker = new CircuitBreaker(5, 30000, 3);
async function chargeCustomer(amount: number): Promise<ChargeResult> {
return paymentServiceBreaker.execute(async () => {
const response = await fetchWithRetry(
'https://payment-service.internal/charge',
{
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ amount }),
}
);
return response.json();
});
}The biggest cost trap in scaling is spending too early on things you do not need yet, or spending too little on things that are actively losing you customers.
Here is roughly what a 1M-user application costs per month, in descending order of spend:
// Typical monthly costs at different scales
// These are real numbers from applications I have worked on
const costBreakdown = {
'10K_users': {
compute: '$50-200', // Single VPS or small cloud instance
database: '$50-100', // Managed PostgreSQL, small instance
cache: '$0-30', // Redis on same server or small instance
cdn: '$0-10', // Free tier usually sufficient
monitoring: '$0-50', // Free tier of Datadog/New Relic
totalMonthly: '$100-400',
perUserMonthly: '$0.01-0.04',
},
'100K_users': {
compute: '$500-2000', // 2-4 app servers
database: '$200-800', // Read replicas, bigger instance
cache: '$100-300', // Dedicated Redis cluster
cdn: '$50-200', // Significant bandwidth
monitoring: '$200-500', // Proper APM
backgroundJobs: '$100-300', // Dedicated worker servers
totalMonthly: '$1000-4000',
perUserMonthly: '$0.01-0.04',
},
'1M_users': {
compute: '$3000-15000', // 10-30 app servers, auto-scaling
database: '$2000-8000', // Multi-replica, connection pooling
cache: '$500-2000', // Redis cluster with replication
cdn: '$500-3000', // Multi-TB bandwidth
monitoring: '$1000-3000', // Full observability stack
backgroundJobs: '$500-2000',
search: '$500-2000', // Dedicated search cluster
totalMonthly: '$8000-35000',
perUserMonthly: '$0.008-0.035',
},
};
// THE KEY INSIGHT: Cost per user should DECREASE as you scale
// If it's increasing, something is architecturally wrong// Auto-scaling configuration based on real metrics
// This is the logic — implement via your cloud provider's auto-scaling API
interface ScalingPolicy {
metric: string;
scaleUpThreshold: number;
scaleDownThreshold: number;
scaleUpCooldownMs: number;
scaleDownCooldownMs: number;
minInstances: number;
maxInstances: number;
}
class AutoScaler {
private lastScaleUp = 0;
private lastScaleDown = 0;
private currentInstances: number;
constructor(private policy: ScalingPolicy) {
this.currentInstances = policy.minInstances;
}
evaluate(currentMetricValue: number): 'scale_up' | 'scale_down' | 'no_change' {
const now = Date.now();
// Scale up: fast (react to load spikes quickly)
if (
currentMetricValue > this.policy.scaleUpThreshold &&
this.currentInstances < this.policy.maxInstances &&
now - this.lastScaleUp > this.policy.scaleUpCooldownMs
) {
this.currentInstances++;
this.lastScaleUp = now;
console.log(JSON.stringify({
level: 'info',
msg: 'scaling_up',
instances: this.currentInstances,
metric: currentMetricValue,
}));
return 'scale_up';
}
// Scale down: slow (avoid flapping)
if (
currentMetricValue < this.policy.scaleDownThreshold &&
this.currentInstances > this.policy.minInstances &&
now - this.lastScaleDown > this.policy.scaleDownCooldownMs
) {
this.currentInstances--;
this.lastScaleDown = now;
console.log(JSON.stringify({
level: 'info',
msg: 'scaling_down',
instances: this.currentInstances,
metric: currentMetricValue,
}));
return 'scale_down';
}
return 'no_change';
}
}
// Production scaling policy
const scaler = new AutoScaler({
metric: 'cpu_utilization',
scaleUpThreshold: 70, // Scale up at 70% CPU
scaleDownThreshold: 30, // Scale down at 30% CPU
scaleUpCooldownMs: 60000, // Wait 1 min between scale-ups
scaleDownCooldownMs: 300000, // Wait 5 min between scale-downs (slow!)
minInstances: 2, // Always at least 2 for redundancy
maxInstances: 20,
});
// COST OPTIMIZATION TIPS (from expensive experience):
//
// 1. Use spot/preemptible instances for workers (60-90% cheaper)
// But NOT for your API servers (getting terminated mid-request is bad)
//
// 2. Reserved instances for baseline capacity (30-50% cheaper)
// Spot for burst capacity
//
// 3. Database: Start with the biggest instance you can afford
// Scaling up a database is easy. Scaling it down is terrifying.
// A $400/month database instance is cheap insurance.
//
// 4. CDN: Cache aggressively. Every request the CDN serves is a request
// your servers don't have to handle. At $0.01/10K requests on origin
// vs $0.001/10K on CDN, the math is obvious.
//
// 5. Background jobs: Batch operations. Instead of sending 1000 individual
// emails, send 1 batch request to your email provider. Instead of
// 1000 individual database inserts, use COPY or multi-row INSERT.// A framework for estimating the cost of technical debt
// Use this to justify refactoring work to your manager
interface TechDebtItem {
description: string;
weeklyEngineerHoursCost: number; // Hours spent working around it
incidentRiskPerMonth: number; // Probability of causing an incident
avgIncidentCostDollars: number; // Revenue loss + engineer time per incident
fixEffortHours: number; // Hours to fix properly
}
function calculateDebtROI(item: TechDebtItem): {
monthlyOngoingCost: number;
expectedMonthlyIncidentCost: number;
totalMonthlyCost: number;
fixCost: number;
paybackMonths: number;
} {
const engineerHourlyCost = 100; // Loaded cost (salary + benefits + overhead)
const monthlyOngoingCost = item.weeklyEngineerHoursCost * 4 * engineerHourlyCost;
const expectedMonthlyIncidentCost =
item.incidentRiskPerMonth * item.avgIncidentCostDollars;
const totalMonthlyCost = monthlyOngoingCost + expectedMonthlyIncidentCost;
const fixCost = item.fixEffortHours * engineerHourlyCost;
const paybackMonths = fixCost / totalMonthlyCost;
return {
monthlyOngoingCost,
expectedMonthlyIncidentCost,
totalMonthlyCost,
fixCost,
paybackMonths,
};
}
// Example: The unindexed query from the war story
const debtExample = calculateDebtROI({
description: 'Missing index on activity_log causing slow dashboard',
weeklyEngineerHoursCost: 2, // 2 hours/week dealing with timeouts
incidentRiskPerMonth: 0.3, // 30% chance of full outage per month
avgIncidentCostDollars: 50000, // Lost revenue + 4 engineers for 4 hours
fixEffortHours: 1, // Adding an index takes 1 hour
});
// Result:
// monthlyOngoingCost: $800
// expectedMonthlyIncidentCost: $15,000
// totalMonthlyCost: $15,800
// fixCost: $100
// paybackMonths: 0.006 (pays for itself in 4 hours)
//
// And yet, somehow, this fix sat in the backlog for 3 months
// because "we don't have time for tech debt right now"After going through three scaling journeys, I have developed a framework for making infrastructure decisions. The core principle is: add complexity only when the pain of not having it exceeds the cost of maintaining it.
SCALE UP (bigger server) when:
├── You have < 10K users
├── Your bottleneck is CPU or memory, not I/O
├── Your team is small (< 5 engineers)
├── You don't need high availability yet
└── Cost: $200 -> $800/month is acceptable
SCALE OUT (more servers) when:
├── You need zero-downtime deployments
├── A single server cannot handle the load
├── You need geographic distribution
├── You need fault tolerance (server failure = no impact)
└── You have the operational maturity to manage it
ADD A CACHE when:
├── The same data is read >10x more than it's written
├── Response times are unacceptable and DB is the bottleneck
├── You can tolerate slightly stale data (even 5 seconds)
└── NOT when: cache invalidation complexity exceeds the benefit
ADD A MESSAGE QUEUE when:
├── You have operations that take >1 second
├── You need guaranteed delivery (email, webhooks)
├── You need to decouple producers from consumers
├── You need rate limiting on downstream services
└── NOT when: synchronous processing is fast enough
SPLIT A MONOLITH when:
├── Different parts need to scale independently
├── Different parts have different deployment velocities
├── Teams are stepping on each other's toes
├── A failure in one area brings down everything
└── NOT when: your team is < 10 people
└── NOT when: you're doing it because microservices are cool
// A practical tool for deciding when to add infrastructure
interface Decision {
question: string;
thresholds: {
doNothing: string;
simple: string;
complex: string;
};
}
const decisions: Decision[] = [
{
question: 'Do you need a load balancer?',
thresholds: {
doNothing: '< 1000 RPM, single server is fine',
simple: '1K-10K RPM: Nginx reverse proxy, 2-3 servers',
complex: '> 10K RPM: L7 load balancer with health checks, auto-scaling',
},
},
{
question: 'Do you need caching?',
thresholds: {
doNothing: 'All queries < 50ms, no repeated reads',
simple: 'Add Redis cache-aside for hot data, TTL-based invalidation',
complex: 'Multi-layer caching, cache warming, event-driven invalidation',
},
},
{
question: 'Do you need read replicas?',
thresholds: {
doNothing: 'Database CPU < 60%, query times acceptable',
simple: '1 read replica, manual query routing',
complex: 'Multiple replicas, automated failover, geographic distribution',
},
},
{
question: 'Do you need background jobs?',
thresholds: {
doNothing: 'All operations complete < 200ms',
simple: 'Basic queue for emails and webhooks',
complex: 'Priority queues, DLQ, idempotency, rate limiting, scheduled jobs',
},
},
{
question: 'Do you need to break up the monolith?',
thresholds: {
doNothing: '< 5 developers, single deployable unit is manageable',
simple: 'Extract 1-2 high-traffic services',
complex: 'Full microservices with service mesh and distributed tracing',
},
},
];None of the above matters if you cannot deploy reliably. Here is the pipeline that has served me well at every scale:
// CI/CD pipeline stages — implement these in GitHub Actions, GitLab CI,
// or whatever you use
interface PipelineStage {
name: string;
duration: string;
description: string;
}
const pipeline: PipelineStage[] = [
{
name: 'lint-and-typecheck',
duration: '1-2 min',
description: 'ESLint + TypeScript compiler. Catches 80% of bugs.',
},
{
name: 'unit-tests',
duration: '2-5 min',
description: 'Fast tests. Mocked dependencies. Run on every push.',
},
{
name: 'integration-tests',
duration: '5-15 min',
description: 'Real database, real Redis. Run on PRs and main.',
},
{
name: 'build',
duration: '3-10 min',
description: 'Build the production artifact. If this fails, nothing deploys.',
},
{
name: 'canary-deploy',
duration: '5-15 min',
description: 'Deploy to 1 server. Route 5% of traffic. Monitor error rate.',
},
{
name: 'full-deploy',
duration: '5-15 min',
description: 'Rolling deployment to all servers. Health checks between batches.',
},
{
name: 'smoke-tests',
duration: '1-2 min',
description: 'Hit critical endpoints in production. Verify they respond.',
},
];
// Canary deployment logic
async function canaryDeploy(config: {
newVersion: string;
canaryPercentage: number;
monitorDurationMs: number;
errorRateThreshold: number;
}): Promise<boolean> {
// Deploy to canary servers
console.log(`Deploying ${config.newVersion} to canary (${config.canaryPercentage}% traffic)`);
// Wait and monitor
const startTime = Date.now();
while (Date.now() - startTime < config.monitorDurationMs) {
await new Promise(r => setTimeout(r, 10000)); // Check every 10 seconds
const metrics = getRouteMetrics('_global').getMetrics();
if (metrics.errorRate > config.errorRateThreshold) {
console.error(JSON.stringify({
level: 'error',
msg: 'canary_rollback',
errorRate: metrics.errorRate,
threshold: config.errorRateThreshold,
version: config.newVersion,
}));
// Rollback canary
return false;
}
if (metrics.p99 > 2000) { // p99 > 2 seconds
console.error(JSON.stringify({
level: 'error',
msg: 'canary_rollback_latency',
p99: metrics.p99,
version: config.newVersion,
}));
return false;
}
}
console.log(`Canary passed. Proceeding with full deploy.`);
return true;
}
// Zero-downtime rolling deployment
async function rollingDeploy(
servers: string[],
newVersion: string,
batchSize: number = 1
): Promise<void> {
for (let i = 0; i < servers.length; i += batchSize) {
const batch = servers.slice(i, i + batchSize);
for (const server of batch) {
// 1. Remove from load balancer
await removeFromLoadBalancer(server);
// 2. Wait for in-flight requests to drain
await waitForDrain(server, 30000);
// 3. Deploy new version
await deployToServer(server, newVersion);
// 4. Health check
const healthy = await waitForHealthy(server, 60000);
if (!healthy) {
throw new Error(`Server ${server} failed health check after deploy`);
}
// 5. Add back to load balancer
await addToLoadBalancer(server);
}
// Wait between batches to catch issues before continuing
if (i + batchSize < servers.length) {
console.log('Waiting 30s between deployment batches...');
await new Promise(r => setTimeout(r, 30000));
}
}
}
async function removeFromLoadBalancer(_server: string): Promise<void> {
// Implementation depends on your LB (Nginx, HAProxy, cloud LB)
}
async function addToLoadBalancer(_server: string): Promise<void> {
// Implementation depends on your LB
}
async function waitForDrain(server: string, timeoutMs: number): Promise<void> {
const start = Date.now();
while (Date.now() - start < timeoutMs) {
const activeConns = await getActiveConnections(server);
if (activeConns === 0) return;
await new Promise(r => setTimeout(r, 1000));
}
}
async function waitForHealthy(server: string, timeoutMs: number): Promise<boolean> {
const start = Date.now();
while (Date.now() - start < timeoutMs) {
try {
const response = await fetch(`http://${server}:3000/api/health`);
if (response.ok) return true;
} catch {
// Not ready yet
}
await new Promise(r => setTimeout(r, 2000));
}
return false;
}
async function deployToServer(_server: string, _version: string): Promise<void> {
// Pull new image, restart process, etc.
}
async function getActiveConnections(_server: string): Promise<number> {
// Query your metrics endpoint
return 0;
}Before you go to sleep tonight, ask yourself these questions about your production system:
RELIABILITY
□ Can you deploy without downtime?
□ Can a single server die without user impact?
□ Do you have automated database backups tested with restore drills?
□ Is your secret rotation automated (or at least documented)?
□ Do you have runbooks for common incidents?
PERFORMANCE
□ Do you know your p95 and p99 latency?
□ Have you run EXPLAIN ANALYZE on your top 10 queries?
□ Is your database connection pool properly sized?
□ Are your most-read data paths cached?
□ Are static assets served from a CDN?
OBSERVABILITY
□ Can you answer "what changed?" when something breaks?
□ Do you have alerts for error rate spikes?
□ Can you trace a request from the load balancer to the database?
□ Are your logs structured and queryable?
□ Do you know your SLO compliance numbers?
SECURITY (often forgotten until it's too late)
□ Are all dependencies up to date?
□ Is SQL injection impossible (parameterized queries everywhere)?
□ Are rate limits in place on all public endpoints?
□ Is sensitive data encrypted at rest and in transit?
□ Do you have a security incident response plan?
COST
□ Do you know your cost per user?
□ Are you paying for resources you're not using?
□ Are spot/preemptible instances used for non-critical workloads?
□ Is your CDN hit rate above 90% for static assets?
□ Are old logs, backups, and data being pruned?
Scaling is not a technical problem. It is an organizational problem with technical symptoms.
The team that scales successfully is not the one with the best technology stack. It is the one that monitors aggressively, deploys confidently, pays down technical debt before it becomes an emergency, and makes decisions based on data rather than anxiety.
Every recommendation in this guide comes from making the wrong choice first. I have been the engineer who shipped the unindexed query, forgot the connection pool, ignored the memory leak, and deployed without canary testing. Each mistake taught me more than any blog post could.
The irony of a blog post about scaling is that most of you do not need most of this right now. If you have fewer than 10K users, focus on the product. The single best optimization is having users who care enough to break your system.
But when they do, come back to this guide. It will be here.
Build for 10x your current load. Plan for 100x. Worry about 1000x when you actually get there.