Systematic debugging from first principles. Scientific method, memory leaks, race conditions, production debugging, war stories, and the mental models that separate juniors who flail from seniors who fix.
A junior developer stares at a bug for six hours. They add console.log after console.log, restart the server forty times, change random things hoping something works. Eventually they find it — or more accurately, they stumble into the fix without understanding why it works. A senior developer looks at the same bug, asks three questions, reads one stack trace, and has a fix in twenty minutes.
This is not talent. This is not "10x engineer" mythology. It is a learnable skill, and almost nobody teaches it explicitly.
I have spent twelve years debugging production systems. E-commerce platforms processing millions of dollars. Real-time collaboration tools. Payment systems where a bug means someone does not get paid. What I have learned is that debugging is not about tools — it is about thinking. The tools help, but without the right mental model, the fanciest debugger in the world will not save you.
This is everything I know about finding bugs fast.
Here is how most developers debug: something breaks, they open the file where they think the problem is, they add a console.log, they re-run the code, they look at the output, they move the console.log somewhere else, repeat. This is the shotgun approach — fire in every direction and hope you hit something.
The shotgun approach has three fatal problems:
It does not scale. When you have ten lines of code, adding random logs works. When you have ten thousand lines, or a hundred microservices, or a race condition that only reproduces under load — you are lost. You are adding logs to a haystack hoping to find a needle.
It does not teach you anything. Even when you stumble onto the fix, you often do not understand why it was broken. Which means you will write the same bug again. And the next time it manifests differently, you will not recognize it.
It is catastrophically slow. I timed this once. A junior on my team spent four hours debugging a timezone conversion bug using the shotgun approach. When I sat down with them and applied a systematic method, we found the root cause in twelve minutes. Not because I am smarter — because I followed a process.
Here is the dirty secret of debugging: most of the time is spent not finding the bug, but looking in the wrong place. The shotgun approach maximizes time spent looking in the wrong place. A systematic approach minimizes it.
Let me show you the cost in real numbers. Say you debug for an average of 2 hours per day (conservative for most engineers). If you are 3x slower than you could be, that is 1.3 hours wasted per day. Over a year, that is 340 hours — more than eight full work weeks — spent staring at the wrong code. For a team of ten, that is nearly two engineer-years of productivity lost to bad debugging.
And it gets worse in production. Every minute a critical bug is live, you are losing users, revenue, and trust. The difference between a 20-minute incident and a 4-hour incident is not a rounding error. It is the difference between "minor blip" and "customers leaving."
// The Shotgun Approach - what most devs do
function processOrder(order: Order): ProcessedOrder {
console.log('order', order); // log 1
const validated = validateOrder(order);
console.log('validated', validated); // log 2
const priced = calculatePricing(validated);
console.log('priced', priced); // log 3
const taxed = applyTax(priced);
console.log('taxed', taxed); // log 4
const discounted = applyDiscounts(taxed);
console.log('discounted', discounted); // log 5
// ... 15 more console.logs scattered everywhere
// Dev stares at terminal output, overwhelmed
return finalizeOrder(discounted);
}
// The Scientific Approach - what seniors do
// Step 1: What is the SYMPTOM? "Order total is wrong"
// Step 2: What is the EXPECTED vs ACTUAL?
// Expected: $107.10 (100 + 7.1% tax)
// Actual: $110.00 (no discount applied)
// Step 3: HYPOTHESIS: The discount function is broken or not called
// Step 4: ONE targeted log to confirm:
function processOrder(order: Order): ProcessedOrder {
const validated = validateOrder(order);
const priced = calculatePricing(validated);
const taxed = applyTax(priced);
console.log('Before discount:', taxed.total, 'Discount code:', taxed.discountCode);
const discounted = applyDiscounts(taxed);
console.log('After discount:', discounted.total, 'Applied:', discounted.discountApplied);
return finalizeOrder(discounted);
}
// Output: Before discount: 107.10, Discount code: SAVE10
// After discount: 107.10, Applied: false
// Root cause found in seconds: discount code validation is case-sensitiveTwo targeted logs versus fifteen random ones. Same bug, fraction of the time.
Every good debugger follows the same process, whether they know it or not. It is the scientific method, applied to code:
Let me break each step down with real examples.
The difference between a junior's bug report and a senior's is precision:
TypeError: Cannot read properties of undefined (reading 'id') when the request body includes a coupon field but no coupon.type sub-field. Requests without coupon succeed."The senior's observation contains everything needed to debug: the endpoint, the method, the error type, the error message, the specific condition that triggers it, and a known-working case for comparison.
Train yourself to answer these questions before you touch any code:
// Real example: API returns 500 intermittently
// Bad observation: "API sometimes fails"
// Good observation:
/*
- POST /api/users/bulk-import returns 500
- Happens when payload has more than 100 users
- Works fine with 50 users, fails at 150
- Error: "PayloadTooLargeError: request entity too large"
- Started after deploy on March 15
- The March 15 deploy updated express from 4.18 to 4.19
- Express 4.19 changed default body-parser limit from 1mb to 100kb
*/
// That observation tells you exactly where to look:
import express from 'express';
const app = express();
// The fix is obvious once the observation is precise
app.use(express.json({ limit: '5mb' }));A good hypothesis is specific and falsifiable. "Something is wrong with the database" is not a hypothesis. "The query is returning stale data because the read replica has replication lag greater than our cache TTL" is a hypothesis.
The key skill is ranking hypotheses by probability. When you see a bug, there are often dozens of possible causes. Experienced debuggers instinctively rank them:
Most likely:
1. Recent code change introduced a bug (what changed recently?)
2. Input data is different from what we expected
3. Environment/config difference (dev vs prod, missing env var)
Less likely:
4. Dependency version mismatch
5. Infrastructure issue (DNS, network, disk)
Least likely:
6. Compiler/runtime bug
7. Hardware failure
8. Cosmic rays flipping bits (yes, this actually happens)
The "What Changed?" Principle is the single most powerful debugging heuristic. If something was working yesterday and is broken today, the bug is almost certainly in whatever changed between yesterday and today. This sounds obvious, but developers constantly ignore it, diving into code that has been stable for months while a fresh commit sits unexamined.
# The first thing I do when investigating a production bug:
git log --oneline --since="2 days ago"
# If there was a deploy:
git diff HEAD~5..HEAD --stat # What files changed in the last 5 commits?
# If it's infrastructure:
# Check recent config changes, scaling events, certificate renewals
# If it's data-related:
# Check for recent data migrations, bulk imports, schema changesThis is the technique that separates 10-minute debugging from 10-hour debugging. Instead of searching linearly through your code, bisect the problem space.
If a request goes through 8 functions before returning the wrong result, do not start at function 1 and add logs through function 8. Start at function 4. Is the data correct at that point? If yes, the bug is in functions 5-8. If no, it is in functions 1-4. You just eliminated half the code in one step.
// A pipeline with 8 stages. The output is wrong.
async function processTransaction(input: RawTransaction): Promise<Result> {
const parsed = parseInput(input); // Stage 1
const validated = validate(parsed); // Stage 2
const enriched = await enrich(validated); // Stage 3
const normalized = normalize(enriched); // Stage 4
// === CHECK HERE FIRST ===
// console.log('midpoint check:', JSON.stringify(normalized, null, 2));
const scored = calculateRisk(normalized); // Stage 5
const routed = routeTransaction(scored); // Stage 6
const executed = await execute(routed); // Stage 7
const confirmed = await confirm(executed); // Stage 8
return confirmed;
}
// If `normalized` looks correct → bug is in stages 5-8
// If `normalized` looks wrong → bug is in stages 1-4
// Next: check stage 2 or stage 6 (depending on first result)
// In 3 checks, you've narrowed 8 stages down to 1
// That's O(log n) vs O(n) — the same improvement as binary searchgit bisect is the version-control equivalent of this technique, and it is criminally underused:
# Something broke between v2.1.0 and HEAD. Find the commit.
git bisect start
git bisect bad HEAD # Current version is broken
git bisect good v2.1.0 # This version was working
# Git checks out the middle commit. Test it.
npm test
git bisect good # Tests pass? Middle commit is fine.
# Git checks out the middle of the remaining range. Test again.
npm test
git bisect bad # Tests fail? Bug is before this commit.
# In ~7 steps, git finds the exact commit out of 100+
# When done:
git bisect resetI have used git bisect to find bugs in codebases with thousands of commits. A bug that would take hours to trace through code is found in minutes because you are leveraging the version history as a search space.
Each test should confirm or disprove exactly one hypothesis. If you change three things at once and the bug goes away, you do not know which change fixed it. Worse, you might have introduced a new bug that happens to mask the old one.
// Hypothesis: The bug is caused by timezone conversion in the date filter
// BAD experiment: Change the timezone handling AND the query AND the cache
// If it works, you don't know what fixed it
// GOOD experiment: Isolate the timezone conversion
const testDate = new Date('2026-03-15T00:00:00Z');
const converted = convertToUserTimezone(testDate, 'America/New_York');
console.log('Input (UTC):', testDate.toISOString());
console.log('Output (ET):', converted.toISOString());
console.log('Expected: 2026-03-14T19:00:00.000-05:00');
// If the output matches expected: timezone conversion is fine, look elsewhere
// If it doesn't match: you found the buggy function, now debug THATMost developers glance at error messages. They see "TypeError" and immediately start guessing. Senior developers read the entire error message — and I mean the entire thing, including the part they think they already understand.
TypeError: Cannot read properties of undefined (reading 'email')
at UserService.getProfile (/app/src/services/user.ts:47:32)
at async AuthMiddleware.validateSession (/app/src/middleware/auth.ts:23:18)
at async /app/src/routes/profile.ts:12:5
at async Layer.handle [as handle_request] (/app/node_modules/express/lib/router/layer.js:95:5)
Here is what each part tells you:
Error type (TypeError): The nature of the problem. TypeError means you tried to use a value as a type it is not — typically undefined or null where you expected an object.
Error message (Cannot read properties of undefined (reading 'email')): Something was undefined and you tried to access .email on it. The variable one step before .email is the culprit.
First stack frame (UserService.getProfile, user.ts:47:32): The exact file, line, and column where the error occurred. Line 47, column 32.
Call stack (remaining frames): How you got there. Read bottom to top: the request came through Express, hit the profile route at line 12, went through auth middleware at line 23, which called getProfile at line 47.
The most important frame is usually NOT the first one. The first frame tells you where the error surfaced. The cause is often two or three frames up. In this case, the real question is: why did AuthMiddleware.validateSession pass an undefined user to getProfile?
// The buggy code at user.ts:47
class UserService {
async getProfile(userId: string): Promise<UserProfile> {
const user = await this.db.users.findUnique({ where: { id: userId } });
return {
name: user.name, // line 47: user is undefined because findUnique
email: user.email, // returned null (user not found)
avatar: user.avatar,
};
}
}
// The REAL bug is in auth.ts:23
class AuthMiddleware {
async validateSession(req: Request): Promise<User> {
const session = await this.sessions.get(req.cookies.sessionId);
// Bug: session exists but session.userId points to a deleted user
// No null check on the result
return this.userService.getProfile(session.userId);
}
}
// The fix
class UserService {
async getProfile(userId: string): Promise<UserProfile | null> {
const user = await this.db.users.findUnique({ where: { id: userId } });
if (!user) return null;
return {
name: user.name,
email: user.email,
avatar: user.avatar,
};
}
}ECONNREFUSED — "Connection refused." The service you are trying to reach is not running, or it is running on a different port/host than you think. Nine times out of ten, this means your database or Redis server is not started, or your .env has the wrong connection string.
// You see:
// Error: connect ECONNREFUSED 127.0.0.1:5432
// What you think: "Something is wrong with my database code"
// What it actually means: PostgreSQL isn't running on localhost:5432
// Debug checklist:
// 1. Is the service actually running? (pg_isready, redis-cli ping)
// 2. Is it running on the port you think? (check config, check .env)
// 3. Is a firewall blocking the connection? (common in Docker setups)
// 4. Are you connecting to localhost when the service is in a container?
// Common Docker mistake:
// Your app is in a container, DB is in another container
// "localhost" inside the app container means the app container, not the DB
const dbUrl = process.env.DATABASE_URL;
// Wrong: postgresql://user:pass@localhost:5432/db
// Right: postgresql://user:pass@db-container:5432/dbENOMEM / "JavaScript heap out of memory" — You ran out of memory. This is either a memory leak (discussed later) or you are processing too much data at once.
// You see:
// FATAL ERROR: Reached heap limit Allocation failed
// - JavaScript heap out of memory
// Common cause: loading an entire table into memory
// BAD
const allUsers = await db.users.findMany(); // 2 million rows
const filtered = allUsers.filter(u => u.active); // Boom. OOM.
// GOOD: Filter in the database
const activeUsers = await db.users.findMany({
where: { active: true },
select: { id: true, email: true }, // Only select what you need
});
// If you must process large datasets, use streaming:
const cursor = db.users.findManyCursor({ batchSize: 1000 });
for await (const batch of cursor) {
await processBatch(batch);
}ERR_MODULE_NOT_FOUND vs MODULE_NOT_FOUND — These look similar but have different causes. ERR_MODULE_NOT_FOUND is the ESM loader failing (wrong import path, missing file extension, package.json "type": "module" issues). MODULE_NOT_FOUND is the CommonJS require() failing (package not installed, wrong path).
// ERR_MODULE_NOT_FOUND: ESM import issue
// "Cannot find module '/app/src/utils' imported from '/app/src/index.ts'"
// Fix: Add the file extension
import { helper } from './utils.js'; // Yes, .js even for .ts files in ESM
// MODULE_NOT_FOUND: Package not installed
// "Cannot find module 'lodash'"
// Fix: npm install lodashEACCES: permission denied — Your process does not have permission to access a file or port. On Unix, ports below 1024 require root. File operations fail if the Node process user does not own the file.
# "Error: listen EACCES: permission denied 0.0.0.0:80"
# Fix: Use a port above 1024, or use a reverse proxy (nginx)
# "EACCES: permission denied, open '/var/log/app.log'"
# Fix: Check file ownership and permissions
ls -la /var/log/app.log
chown nodeuser:nodeuser /var/log/app.logUnhandled Promise Rejection — The most insidious Node.js error. An async function threw an error, and nobody caught it. In Node 15+, this crashes the process by default.
// This will crash your process with no useful stack trace:
async function fetchData() {
const res = await fetch('https://api.example.com/data');
return res.json();
}
fetchData(); // No .catch(), no try/catch wrapper
// This is worse — it silently swallows the error:
fetchData().catch(() => {}); // "I'll deal with errors later" (narrator: they never did)
// This is correct:
async function fetchData(): Promise<ApiData> {
try {
const res = await fetch('https://api.example.com/data');
if (!res.ok) {
throw new Error(`API returned ${res.status}: ${await res.text()}`);
}
return res.json();
} catch (error) {
logger.error('Failed to fetch data', {
error: error instanceof Error ? error.message : String(error),
stack: error instanceof Error ? error.stack : undefined,
});
throw error; // Re-throw so the caller knows it failed
}
}
// Global safety net (does NOT replace proper error handling):
process.on('unhandledRejection', (reason, promise) => {
logger.fatal('Unhandled rejection', {
reason: reason instanceof Error ? reason.message : String(reason),
stack: reason instanceof Error ? reason.stack : undefined,
});
// In production, you probably want to gracefully shut down
process.exit(1);
});Segmentation Fault (SIGSEGV) — Your Node.js process touched memory it should not have. This is almost always a native addon bug or a Node.js/V8 bug, not your JavaScript code. Check your native dependencies (sharp, bcrypt, canvas, etc.) and make sure they are compiled for your platform and Node version.
# When you see a segfault, get more info:
node --report-on-faulthandler your-app.js
# Or run with core dumps enabled:
ulimit -c unlimited
node your-app.js
# After crash, analyze with lldb or gdb:
lldb node -c core.12345console.log is a tool, not a strategy. Here is what senior engineers actually use.
debugger Statement#Drop a debugger statement in your code and run Node with --inspect. The execution will pause at that line, and you can inspect every variable in scope.
async function calculateDiscount(order: Order, coupon: Coupon): Promise<number> {
const basePrice = order.items.reduce((sum, item) => sum + item.price * item.qty, 0);
const eligibleItems = order.items.filter(item => coupon.categories.includes(item.category));
debugger; // Execution pauses here. Inspect basePrice, eligibleItems, coupon.
const discount = eligibleItems.reduce((sum, item) => {
return sum + (item.price * item.qty * coupon.percentage) / 100;
}, 0);
return Math.min(discount, coupon.maxDiscount ?? Infinity);
}# Run with inspect:
node --inspect dist/server.js
# Or for immediate break (useful for startup bugs):
node --inspect-brk dist/server.js
# Then open Chrome and navigate to chrome://inspect
# Click "inspect" on your Node.js process
# You get the full Chrome DevTools for your server-side codeStop clicking "Run" and staring at terminal output. Set up proper debugging:
// .vscode/launch.json
{
"version": "0.2.0",
"configurations": [
{
"name": "Debug Server",
"type": "node",
"request": "launch",
"runtimeExecutable": "node",
"runtimeArgs": ["--loader", "tsx"],
"args": ["src/server.ts"],
"env": {
"NODE_ENV": "development",
"DEBUG": "app:*"
},
"console": "integratedTerminal",
"sourceMaps": true,
"resolveSourceMapLocations": [
"${workspaceFolder}/**",
"!**/node_modules/**"
]
},
{
"name": "Debug Current Test",
"type": "node",
"request": "launch",
"runtimeExecutable": "npx",
"runtimeArgs": ["vitest", "run", "${relativeFile}"],
"console": "integratedTerminal",
"sourceMaps": true
},
{
"name": "Attach to Running Process",
"type": "node",
"request": "attach",
"port": 9229,
"restart": true,
"sourceMaps": true
}
]
}The killer feature most developers never use. Instead of breaking on every iteration of a loop, break only when a condition is true.
In VS Code: right-click the gutter next to a line number, select "Conditional Breakpoint", and enter an expression:
// Say you have a loop processing 10,000 orders
// and only order #7,392 has the bug
for (const order of orders) {
const result = processOrder(order);
// Set a conditional breakpoint here with expression:
// order.id === '7392' || result.total < 0
results.push(result);
}
// The debugger will only pause when that specific order is hit
// or when the total goes negative — no need to click "Continue" 7,391 timesVS Code logpoints let you add temporary logging without modifying your code. Right-click the gutter, select "Logpoint", and enter a message with expressions in curly braces:
// Logpoint message:
// Order {order.id}: total={result.total}, items={order.items.length}
// This prints to the debug console without pausing execution
// AND without modifying your source code
// Perfect for debugging production-like environments where you
// don't want to commit debug logs
When you connect Chrome DevTools to your Node process (node --inspect), you get much more than a debugger.
Performance Tab — Record CPU activity to find what is eating your processor time:
// If your server is using 100% CPU, don't guess — profile it.
// 1. Open chrome://inspect, click "inspect" on your Node process
// 2. Go to the "Performance" tab (previously "Profiler")
// 3. Click "Record", trigger the slow operation, click "Stop"
// 4. Look at the flame chart — the widest bars are where time is spent
// Common findings:
// - JSON.parse/JSON.stringify on huge objects
// - Regular expressions with catastrophic backtracking
// - Synchronous crypto operations (use async versions)
// - Repeated database queries in a loop (N+1 problem)Memory Tab — Take heap snapshots to find memory leaks:
// 1. Open Chrome DevTools on your Node process
// 2. Go to "Memory" tab
// 3. Take a heap snapshot (baseline)
// 4. Perform the operation you suspect leaks memory
// 5. Take another heap snapshot
// 6. Select "Comparison" view between snapshot 1 and 2
// 7. Sort by "Size Delta" — the biggest growth is your leak
// You can also use "Allocation instrumentation on timeline"
// to see exactly when and where allocations happenNetwork Tab (for browser debugging) — The waterfall view shows you exactly why your page is slow:
// Things to look for in the Network waterfall:
// 1. Blocking chains: Request B can't start until Request A finishes
// Fix: Parallelize with Promise.all, or preload critical resources
//
// 2. Huge payloads: A 5MB JSON response when you need 3 fields
// Fix: GraphQL, sparse fieldsets, pagination, server-side filtering
//
// 3. Redundant requests: The same API called 5 times
// Fix: Request deduplication, caching, React Query/SWR
//
// 4. Slow TTFB (Time to First Byte): Server is slow
// Fix: Profile the server, add caching, optimize queries
debug Module#The debug npm package is absurdly useful for libraries and complex applications. It lets you add namespace-scoped debug logging that is completely silent unless enabled:
import createDebug from 'debug';
const debug = createDebug('app:orders');
const debugDb = createDebug('app:orders:db');
const debugCache = createDebug('app:orders:cache');
async function getOrder(id: string): Promise<Order> {
debug('Fetching order %s', id);
const cached = await cache.get(`order:${id}`);
if (cached) {
debugCache('Cache hit for order %s', id);
return cached;
}
debugCache('Cache miss for order %s, querying DB', id);
const order = await db.orders.findUnique({ where: { id } });
debugDb('DB query returned %d items for order %s', order ? 1 : 0, id);
if (order) {
await cache.set(`order:${id}`, order, { ttl: 300 });
debugCache('Cached order %s with TTL 300s', id);
}
return order;
}# See all debug output:
DEBUG=app:* node server.js
# See only database-related debug output:
DEBUG=app:*:db node server.js
# See everything except cache:
DEBUG=app:*,-app:*:cache node server.js
# In production, leave DEBUG unset — zero overheadwtfnode — Find What Is Keeping Node Alive#When your Node process does not exit cleanly, something is holding it open. Open handles — timers, sockets, event listeners — keep the event loop alive:
// Install: npm install wtfnode
// At the top of your entry point:
import wtf from 'wtfnode';
// When your process should have exited but didn't:
// Send SIGINFO (Ctrl+T on macOS, or call wtf.dump())
setTimeout(() => {
wtf.dump();
}, 5000);
// Output looks like:
// [WTF Node?] open handles:
// - Timers:
// - (10000 ~ 10 s) (anonymous) @ /app/src/lib/cache.ts:45
// - TCP sockets:
// - 127.0.0.1:5432 (connected to database, not closed)
// - Child processes:
// - PID 12345 (spawn)Memory leaks do not crash your app immediately. They slowly consume memory over hours or days until either the OS kills the process (OOMKiller) or garbage collection pauses become so long that your app is effectively frozen. They are the most insidious bugs because everything works fine in development (where processes restart constantly) and only manifests in production under sustained load.
Method 1: Process Memory Monitoring
// Add this to your server for continuous memory monitoring
const MEMORY_CHECK_INTERVAL = 60_000; // Every minute
const MEMORY_WARN_THRESHOLD = 512 * 1024 * 1024; // 512MB
const MEMORY_CRITICAL_THRESHOLD = 1024 * 1024 * 1024; // 1GB
setInterval(() => {
const usage = process.memoryUsage();
const heapUsedMB = Math.round(usage.heapUsed / 1024 / 1024);
const rssMB = Math.round(usage.rss / 1024 / 1024);
logger.info('Memory usage', {
heapUsed: `${heapUsedMB}MB`,
heapTotal: `${Math.round(usage.heapTotal / 1024 / 1024)}MB`,
rss: `${rssMB}MB`,
external: `${Math.round(usage.external / 1024 / 1024)}MB`,
arrayBuffers: `${Math.round(usage.arrayBuffers / 1024 / 1024)}MB`,
});
if (usage.heapUsed > MEMORY_CRITICAL_THRESHOLD) {
logger.fatal('CRITICAL: Memory usage exceeded 1GB', { heapUsedMB });
// Take a heap snapshot for later analysis
// Then gracefully restart
} else if (usage.heapUsed > MEMORY_WARN_THRESHOLD) {
logger.warn('WARNING: Memory usage high', { heapUsedMB });
}
}, MEMORY_CHECK_INTERVAL);Method 2: Heap Snapshots
import v8 from 'node:v8';
import fs from 'node:fs';
function takeHeapSnapshot(label: string): string {
const filename = `/tmp/heap-${label}-${Date.now()}.heapsnapshot`;
const snapshotStream = v8.writeHeapSnapshot(filename);
logger.info(`Heap snapshot written to ${snapshotStream}`);
return snapshotStream;
}
// Take snapshots at different points to compare:
// 1. Right after server startup (baseline)
// 2. After processing 1000 requests
// 3. After processing 10000 requests
// If memory grows linearly with requests, you have a leakLeak 1: Event Listeners That Are Never Removed
This is the number one cause of memory leaks in Node.js applications:
// BUGGY: Leaks memory on every request
class NotificationService {
private emitter: EventEmitter;
constructor(emitter: EventEmitter) {
this.emitter = emitter;
}
async handleRequest(req: Request, res: Response): Promise<void> {
// This adds a NEW listener on EVERY request
// After 10,000 requests, there are 10,000 listeners
// Each listener closes over `res`, preventing GC
this.emitter.on('notification', (data) => {
res.write(`data: ${JSON.stringify(data)}\n\n`);
});
req.on('close', () => {
// Even with this cleanup, we're using .on() not .once()
// and we don't have a reference to the specific listener to remove
});
}
}
// FIXED: Properly manage listeners
class NotificationService {
private emitter: EventEmitter;
constructor(emitter: EventEmitter) {
this.emitter = emitter;
}
async handleRequest(req: Request, res: Response): Promise<void> {
// Create a named function so we can remove it specifically
const listener = (data: unknown) => {
res.write(`data: ${JSON.stringify(data)}\n\n`);
};
this.emitter.on('notification', listener);
// Clean up when the client disconnects
req.on('close', () => {
this.emitter.removeListener('notification', listener);
});
// Or even better, use AbortController:
const controller = new AbortController();
this.emitter.on('notification', listener, { signal: controller.signal });
req.on('close', () => {
controller.abort(); // Automatically removes the listener
});
}
}Leak 2: Closures Holding References to Large Objects
// BUGGY: Closure keeps the entire `bigData` object alive
function processFile(path: string): () => string {
const bigData = fs.readFileSync(path); // 500MB file
const summary = computeSummary(bigData); // Small string
// This closure captures the entire scope, including `bigData`
// Even though it only uses `summary`, V8 may retain `bigData`
return () => {
return `File processed: ${summary}`;
};
}
// The returned function keeps `bigData` in memory forever
const getSummary = processFile('/data/huge-export.csv');
// FIXED: Null out references you don't need
function processFile(path: string): () => string {
let bigData: Buffer | null = fs.readFileSync(path);
const summary = computeSummary(bigData);
bigData = null; // Explicitly release the reference
return () => {
return `File processed: ${summary}`;
};
}
// Even better: structure the code so the closure never captures it
function processFile(path: string): () => string {
const summary = computeSummaryFromPath(path); // Process and return only what's needed
return () => {
return `File processed: ${summary}`;
};
}Leak 3: Unbounded Caches and Maps
// BUGGY: Cache grows forever
const cache = new Map<string, UserProfile>();
async function getUserProfile(userId: string): Promise<UserProfile> {
if (cache.has(userId)) {
return cache.get(userId)!;
}
const profile = await db.users.findUnique({ where: { id: userId } });
cache.set(userId, profile); // Never evicted. Map grows until OOM.
return profile;
}
// FIXED: Use a bounded cache with TTL
import { LRUCache } from 'lru-cache';
const cache = new LRUCache<string, UserProfile>({
max: 10_000, // Maximum 10,000 entries
ttl: 5 * 60 * 1000, // Entries expire after 5 minutes
maxSize: 50_000_000, // Maximum 50MB total size
sizeCalculation: (value) => JSON.stringify(value).length,
});
async function getUserProfile(userId: string): Promise<UserProfile> {
const cached = cache.get(userId);
if (cached) return cached;
const profile = await db.users.findUnique({ where: { id: userId } });
cache.set(userId, profile);
return profile;
}
// ALSO FIXED: Use WeakMap when the key is an object
// WeakMap entries are GC'd when the key is no longer referenced elsewhere
const metadata = new WeakMap<Request, RequestMetadata>();
function middleware(req: Request, res: Response, next: NextFunction): void {
metadata.set(req, { startTime: Date.now(), traceId: generateTraceId() });
// When `req` is garbage collected (after response), the entry is too
next();
}Leak 4: Uncleared Timers and Intervals
// BUGGY: Interval never cleared
class ConnectionPool {
private healthCheckInterval: NodeJS.Timeout;
constructor() {
// This interval keeps the pool object alive forever
// Even if you null out the pool reference
this.healthCheckInterval = setInterval(() => {
this.checkConnections(); // `this` reference prevents GC
}, 30_000);
}
// No cleanup method!
}
// Each time you create a new pool (e.g., in tests), the old one leaks
let pool = new ConnectionPool(); // Interval running
pool = new ConnectionPool(); // OLD interval still running + new one
pool = new ConnectionPool(); // Now 3 intervals running
// FIXED: Always provide cleanup
class ConnectionPool {
private healthCheckInterval: NodeJS.Timeout;
constructor() {
this.healthCheckInterval = setInterval(() => {
this.checkConnections();
}, 30_000);
// Prevent the timer from keeping the process alive
this.healthCheckInterval.unref();
}
async destroy(): Promise<void> {
clearInterval(this.healthCheckInterval);
await this.closeAllConnections();
}
}
// Use with explicit cleanup
const pool = new ConnectionPool();
process.on('SIGTERM', async () => {
await pool.destroy();
process.exit(0);
});Leak 5: Accumulating Global State
// BUGGY: Array grows with every request
const requestLog: Array<{ url: string; timestamp: number; body: unknown }> = [];
app.use((req, res, next) => {
requestLog.push({
url: req.url,
timestamp: Date.now(),
body: req.body, // Holds reference to potentially large request bodies
});
next();
});
// After 1 million requests, this array is enormous
// And it can never be garbage collected because it's module-scoped
// FIXED: Use a ring buffer with a fixed size
class RingBuffer<T> {
private buffer: Array<T | undefined>;
private index = 0;
constructor(private capacity: number) {
this.buffer = new Array(capacity);
}
push(item: T): void {
this.buffer[this.index % this.capacity] = item;
this.index++;
}
getRecent(count: number): T[] {
const result: T[] = [];
const start = Math.max(0, this.index - count);
for (let i = start; i < this.index; i++) {
const item = this.buffer[i % this.capacity];
if (item !== undefined) result.push(item);
}
return result;
}
}
const requestLog = new RingBuffer<{ url: string; timestamp: number }>(1000);
app.use((req, res, next) => {
requestLog.push({
url: req.url,
timestamp: Date.now(),
// Don't store request bodies — log them to a proper logging system
});
next();
});Race conditions are the boss fight of debugging. They are intermittent, hard to reproduce, and they make you question your sanity. The code looks correct. The logic is sound. But under certain timing conditions — conditions you cannot easily control — it breaks.
await#This is the single most common async bug in TypeScript. It looks so innocuous that it passes code review:
// BUGGY: Missing await
async function createUser(data: CreateUserInput): Promise<User> {
const user = await db.users.create({ data });
// BUG: This fires and is immediately forgotten
// If it fails, the error is swallowed as an unhandled rejection
// If the response is sent before this completes, the email never sends
sendWelcomeEmail(user.email); // Missing await!
await db.audit.create({
data: { action: 'USER_CREATED', userId: user.id },
});
return user;
}
// FIXED: Await or explicitly handle fire-and-forget
async function createUser(data: CreateUserInput): Promise<User> {
const user = await db.users.create({ data });
// Option 1: Await it (blocks the response until email is sent)
await sendWelcomeEmail(user.email);
// Option 2: Fire-and-forget with explicit error handling
sendWelcomeEmail(user.email).catch((error) => {
logger.error('Failed to send welcome email', {
userId: user.id,
error: error.message,
});
// Queue for retry
emailRetryQueue.add({ email: user.email, template: 'welcome' });
});
// Option 3: Use a proper job queue (best for production)
await emailQueue.add('welcome', { userId: user.id, email: user.email });
await db.audit.create({
data: { action: 'USER_CREATED', userId: user.id },
});
return user;
}How to catch missing awaits with TypeScript:
// tsconfig.json — enable these rules
{
"compilerOptions": {
"strict": true,
"noUncheckedIndexedAccess": true
}
}
// eslint config — the key rule
// @typescript-eslint/no-floating-promises: "error"
// This catches any Promise that isn't awaited, returned, or .catch()'dThis pattern looks safe but is broken under concurrent access:
// BUGGY: Race condition in "check then act"
async function reserveSeat(eventId: string, userId: string): Promise<boolean> {
// Step 1: Check if seats are available
const event = await db.events.findUnique({ where: { id: eventId } });
if (event.availableSeats <= 0) {
return false; // No seats left
}
// BUG: Between the check above and the update below,
// another request can also pass the check.
// With 1 seat left and 2 concurrent requests,
// both see "1 seat available" and both proceed.
// Result: -1 available seats (oversold!)
// Step 2: Reserve the seat
await db.events.update({
where: { id: eventId },
data: { availableSeats: event.availableSeats - 1 },
});
await db.reservations.create({
data: { eventId, userId },
});
return true;
}
// FIXED: Atomic check-and-update
async function reserveSeat(eventId: string, userId: string): Promise<boolean> {
// Use an atomic conditional update
// This is a single operation — no gap for a race condition
const result = await db.events.updateMany({
where: {
id: eventId,
availableSeats: { gt: 0 }, // Only update if seats > 0
},
data: {
availableSeats: { decrement: 1 },
},
});
if (result.count === 0) {
return false; // No seats were available (or event not found)
}
await db.reservations.create({
data: { eventId, userId },
});
return true;
}
// ALTERNATIVE FIX: Use database-level locking
async function reserveSeat(eventId: string, userId: string): Promise<boolean> {
return await db.$transaction(async (tx) => {
// SELECT ... FOR UPDATE acquires a row-level lock
const event = await tx.$queryRaw<Array<{ available_seats: number }>>`
SELECT available_seats FROM events WHERE id = ${eventId} FOR UPDATE
`;
if (!event[0] || event[0].available_seats <= 0) {
return false;
}
await tx.events.update({
where: { id: eventId },
data: { availableSeats: { decrement: 1 } },
});
await tx.reservations.create({
data: { eventId, userId },
});
return true;
});
}Users click "Submit" twice. Your server processes both requests. Now there are two orders, two charges, two records:
// BUGGY: No idempotency protection
app.post('/api/orders', async (req, res) => {
const order = await createOrder(req.body);
await chargePayment(order.total, req.body.paymentMethod);
res.json(order);
});
// User double-clicks → two orders, two charges
// FIXED: Idempotency key
app.post('/api/orders', async (req, res) => {
const idempotencyKey = req.headers['x-idempotency-key'];
if (!idempotencyKey) {
return res.status(400).json({ error: 'Missing X-Idempotency-Key header' });
}
// Check if we've already processed this request
const existing = await db.idempotencyKeys.findUnique({
where: { key: idempotencyKey },
});
if (existing) {
// Return the same response we returned before
return res.status(existing.statusCode).json(existing.responseBody);
}
// Process the request
const order = await createOrder(req.body);
await chargePayment(order.total, req.body.paymentMethod);
// Store the result keyed by the idempotency key
await db.idempotencyKeys.create({
data: {
key: idempotencyKey,
statusCode: 200,
responseBody: order,
expiresAt: new Date(Date.now() + 24 * 60 * 60 * 1000), // 24h
},
});
res.json(order);
});
// Client side:
async function submitOrder(orderData: OrderInput): Promise<Order> {
const idempotencyKey = crypto.randomUUID(); // Generate once per user action
const response = await fetch('/api/orders', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-Idempotency-Key': idempotencyKey,
},
body: JSON.stringify(orderData),
});
return response.json();
}// BUGGY: Shared mutable state
class OrderProcessor {
private currentBatch: Order[] = [];
async addToBatch(order: Order): Promise<void> {
this.currentBatch.push(order);
if (this.currentBatch.length >= 100) {
// BUG: While we're awaiting processBatch, another call to addToBatch
// can push to the same array AND trigger another processBatch call.
// Result: some orders processed twice, some skipped.
await this.processBatch(this.currentBatch);
this.currentBatch = [];
}
}
private async processBatch(orders: Order[]): Promise<void> {
await db.orders.createMany({ data: orders });
}
}
// FIXED: Swap-and-process pattern
class OrderProcessor {
private currentBatch: Order[] = [];
private processing = false;
private queue: Order[] = [];
async addToBatch(order: Order): Promise<void> {
this.queue.push(order);
await this.flush();
}
private async flush(): Promise<void> {
if (this.processing) return; // Already flushing
this.processing = true;
try {
while (this.queue.length >= 100) {
// Atomically swap the batch — grab exactly 100, leave the rest
const batch = this.queue.splice(0, 100);
await this.processBatch(batch);
}
} finally {
this.processing = false;
}
}
private async processBatch(orders: Order[]): Promise<void> {
await db.orders.createMany({ data: orders });
}
}// BUGGY: One failure kills everything
async function loadDashboard(userId: string): Promise<Dashboard> {
// If getNotifications fails, the ENTIRE dashboard fails
// Even though orders and profile loaded fine
const [orders, profile, notifications] = await Promise.all([
getOrders(userId),
getProfile(userId),
getNotifications(userId), // This flaky service brings everything down
]);
return { orders, profile, notifications };
}
// FIXED: Use allSettled for non-critical operations
async function loadDashboard(userId: string): Promise<Dashboard> {
const [ordersResult, profileResult, notificationsResult] =
await Promise.allSettled([
getOrders(userId),
getProfile(userId),
getNotifications(userId),
]);
// Critical data: throw if missing
if (ordersResult.status === 'rejected') {
throw new Error(`Failed to load orders: ${ordersResult.reason}`);
}
if (profileResult.status === 'rejected') {
throw new Error(`Failed to load profile: ${profileResult.reason}`);
}
// Non-critical data: gracefully degrade
const notifications =
notificationsResult.status === 'fulfilled'
? notificationsResult.value
: [];
if (notificationsResult.status === 'rejected') {
logger.warn('Failed to load notifications', {
userId,
error: notificationsResult.reason,
});
}
return {
orders: ordersResult.value,
profile: profileResult.value,
notifications,
};
}Performance bugs are different from correctness bugs. The code works — it just works slowly. And "slowly" is relative. A 200ms API response might be fine for a dashboard but catastrophic for a real-time game.
I have watched senior engineers spend days optimizing code that was not the bottleneck. The golden rule: never optimize without profiling first.
// Simple but effective: measure individual operations
function withTiming<T>(label: string, fn: () => T): T {
const start = performance.now();
const result = fn();
const duration = performance.now() - start;
if (duration > 100) { // Only log slow operations
logger.warn(`Slow operation: ${label} took ${duration.toFixed(2)}ms`);
}
return result;
}
async function withTimingAsync<T>(label: string, fn: () => Promise<T>): Promise<T> {
const start = performance.now();
const result = await fn();
const duration = performance.now() - start;
if (duration > 100) {
logger.warn(`Slow operation: ${label} took ${duration.toFixed(2)}ms`);
}
return result;
}
// Usage
const users = await withTimingAsync('fetchActiveUsers', () =>
db.users.findMany({ where: { active: true } })
);
const processed = withTiming('processUsers', () =>
users.map(transformUser)
);Flame graphs are the single most useful tool for CPU performance debugging. They show you exactly where your process is spending time, down to the individual function.
# Method 1: Node.js built-in profiler
node --prof your-app.js
# After exercising the app, you get a v8.log file
node --prof-process isolate-*.log > processed.txt
# Method 2: 0x — generates interactive flame graphs
npx 0x your-app.js
# Exercise the app, then Ctrl+C
# Opens a browser with an interactive flame graph
# Method 3: clinic.js — the most comprehensive profiler
npx clinic doctor -- node your-app.js
# Or for flame graphs specifically:
npx clinic flame -- node your-app.jsThis is the most common performance bug in web applications and most developers do not realize they have it:
// BUGGY: N+1 queries — 1 query for posts + N queries for authors
async function getPosts(): Promise<PostWithAuthor[]> {
const posts = await db.posts.findMany({ take: 50 }); // 1 query
// For each post, query the author individually
const postsWithAuthors = await Promise.all(
posts.map(async (post) => {
const author = await db.users.findUnique({
where: { id: post.authorId },
}); // N queries (50 in this case)
return { ...post, author };
})
);
return postsWithAuthors; // Total: 51 queries for 50 posts
}
// FIXED: Single query with join
async function getPosts(): Promise<PostWithAuthor[]> {
return db.posts.findMany({
take: 50,
include: { author: true }, // Single query with JOIN
});
// Total: 1 query
}
// ALTERNATIVE FIX: DataLoader pattern for complex cases
import DataLoader from 'dataloader';
const userLoader = new DataLoader(async (userIds: readonly string[]) => {
const users = await db.users.findMany({
where: { id: { in: [...userIds] } },
});
// DataLoader requires results in the same order as keys
const userMap = new Map(users.map(u => [u.id, u]));
return userIds.map(id => userMap.get(id) ?? null);
});
async function getPosts(): Promise<PostWithAuthor[]> {
const posts = await db.posts.findMany({ take: 50 });
const postsWithAuthors = await Promise.all(
posts.map(async (post) => {
// DataLoader batches these into a single query automatically
const author = await userLoader.load(post.authorId);
return { ...post, author };
})
);
return postsWithAuthors; // Total: 2 queries (1 for posts, 1 batched for authors)
}These are fundamentally different problems that require different solutions:
Latency = How long does ONE request take? Throughput = How many requests can you handle per second?
// Latency problem: individual requests are slow
// Diagnosis: time individual operations
// Solution: optimize the hot path, add caching, fix slow queries
// Throughput problem: server can't handle enough concurrent requests
// Diagnosis: load test and watch where it saturates
// Solution: increase concurrency, reduce blocking, scale horizontally
// Example: This function has great throughput but bad latency
async function getReport(userId: string): Promise<Report> {
// These run sequentially — total time is SUM of all operations
const orders = await getOrders(userId); // 200ms
const profile = await getProfile(userId); // 50ms
const analytics = await getAnalytics(userId); // 300ms
// Total: 550ms latency (bad for a single request)
return buildReport(orders, profile, analytics);
}
// Fixed for latency: parallelize independent operations
async function getReport(userId: string): Promise<Report> {
// These run in parallel — total time is MAX of all operations
const [orders, profile, analytics] = await Promise.all([
getOrders(userId), // 200ms
getProfile(userId), // 50ms
getAnalytics(userId), // 300ms
]);
// Total: 300ms latency (the slowest operation)
return buildReport(orders, profile, analytics);
}
// Example: This function has great latency but bad throughput
import { readFileSync } from 'node:fs';
app.get('/api/data', (req, res) => {
// readFileSync blocks the event loop
// While this is reading the file, NO other requests can be processed
const data = readFileSync('/data/big-file.json', 'utf-8');
res.json(JSON.parse(data));
});
// Fixed for throughput: use async I/O
import { readFile } from 'node:fs/promises';
app.get('/api/data', async (req, res) => {
// Non-blocking — other requests can be processed while this reads
const data = await readFile('/data/big-file.json', 'utf-8');
res.json(JSON.parse(data));
});// Prisma query logging
const prisma = new PrismaClient({
log: [
{ emit: 'event', level: 'query' },
],
});
prisma.$on('query', (e) => {
if (e.duration > 100) { // Log queries slower than 100ms
logger.warn('Slow query detected', {
query: e.query,
params: e.params,
duration: `${e.duration}ms`,
timestamp: e.timestamp,
});
}
});
// PostgreSQL: Find the actual slow queries
// Run this in your database:
/*
SELECT
calls,
mean_exec_time::numeric(10,2) as avg_ms,
total_exec_time::numeric(10,2) as total_ms,
query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
*/
// Common fixes for slow queries:
// 1. Add an index
// CREATE INDEX idx_orders_user_id ON orders(user_id);
// CREATE INDEX idx_orders_created_at ON orders(created_at DESC);
// 2. Add a composite index for queries that filter on multiple columns
// CREATE INDEX idx_orders_user_status ON orders(user_id, status);
// 3. Use EXPLAIN ANALYZE to understand the query plan
// EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = '123' AND status = 'pending';When the Node.js event loop is blocked, everything slows down. Here is how to detect and fix it:
// Detect event loop lag
let lastCheck = performance.now();
setInterval(() => {
const now = performance.now();
const lag = now - lastCheck - 1000; // Expected interval is 1000ms
lastCheck = now;
if (lag > 100) {
logger.warn('Event loop lag detected', {
lagMs: lag.toFixed(2),
// Anything above 100ms is noticeable
// Anything above 1000ms means the app is frozen
});
}
}, 1000).unref();
// Common causes of event loop blocking:
// 1. Synchronous file I/O (readFileSync, writeFileSync)
// 2. JSON.parse/JSON.stringify on huge objects
// 3. CPU-heavy computation (crypto, compression, sorting large arrays)
// 4. Long-running regular expressions
// Fix: Offload heavy work to worker threads
import { Worker, isMainThread, parentPort, workerData } from 'node:worker_threads';
// main.ts
function computeHeavy(data: unknown): Promise<unknown> {
return new Promise((resolve, reject) => {
const worker = new Worker(new URL('./heavy-worker.ts', import.meta.url), {
workerData: data,
});
worker.on('message', resolve);
worker.on('error', reject);
});
}
// heavy-worker.ts
if (!isMainThread && parentPort) {
const result = expensiveComputation(workerData);
parentPort.postMessage(result);
}Here is the hard truth: most real bugs happen in production, where you cannot attach a debugger. You cannot add console.log. You cannot reproduce the user's exact environment. You have logs, metrics, and your wits. This section is about making those count.
Unstructured logs are useless at scale. When you have 10 servers each handling 1000 requests per second, console.log('user login failed') tells you nothing.
// USELESS in production
console.log('Failed to process order');
console.log('Error:', error.message);
console.log('Order ID:', orderId);
// USEFUL in production
logger.error('Order processing failed', {
orderId,
userId: req.user?.id,
action: 'processOrder',
errorType: error.constructor.name,
errorMessage: error.message,
errorStack: error.stack,
requestId: req.headers['x-request-id'],
duration: Date.now() - startTime,
input: {
itemCount: req.body.items?.length,
totalAmount: req.body.totalAmount,
paymentMethod: req.body.paymentMethod,
// Do NOT log sensitive data (card numbers, passwords, tokens)
},
});Here is my production logging setup with Pino:
import pino from 'pino';
const logger = pino({
level: process.env.LOG_LEVEL ?? 'info',
timestamp: pino.stdTimeFunctions.isoTime,
formatters: {
level: (label) => ({ level: label }),
bindings: (bindings) => ({
pid: bindings.pid,
host: bindings.hostname,
service: 'api',
version: process.env.APP_VERSION ?? 'unknown',
}),
},
redact: {
paths: [
'req.headers.authorization',
'req.headers.cookie',
'*.password',
'*.token',
'*.secret',
'*.creditCard',
'*.ssn',
],
censor: '[REDACTED]',
},
serializers: {
err: pino.stdSerializers.err,
req: pino.stdSerializers.req,
res: pino.stdSerializers.res,
},
});
// Create child loggers with request context
function requestLogger(req: Request): pino.Logger {
return logger.child({
requestId: req.headers['x-request-id'] ?? crypto.randomUUID(),
userId: req.user?.id,
path: req.path,
method: req.method,
});
}
// Now every log line from this request includes the context automatically
app.use((req, res, next) => {
req.log = requestLogger(req);
req.log.info('Request started');
const startTime = Date.now();
res.on('finish', () => {
req.log.info('Request completed', {
statusCode: res.statusCode,
duration: Date.now() - startTime,
});
});
next();
});When a request passes through multiple services, you need to follow it across boundaries:
// Propagate trace context through every service
import { randomUUID } from 'node:crypto';
interface TraceContext {
traceId: string; // Unique ID for the entire request chain
spanId: string; // Unique ID for this specific operation
parentSpanId?: string; // The span that called this one
}
function createTrace(incoming?: { traceId?: string; parentSpanId?: string }): TraceContext {
return {
traceId: incoming?.traceId ?? randomUUID(),
spanId: randomUUID(),
parentSpanId: incoming?.parentSpanId,
};
}
// Middleware: extract or create trace context
app.use((req, res, next) => {
const trace = createTrace({
traceId: req.headers['x-trace-id'] as string,
parentSpanId: req.headers['x-span-id'] as string,
});
req.trace = trace;
req.log = logger.child({
traceId: trace.traceId,
spanId: trace.spanId,
parentSpanId: trace.parentSpanId,
});
// Include trace ID in response for debugging
res.setHeader('x-trace-id', trace.traceId);
next();
});
// When calling another service, propagate the trace
async function callOrderService(trace: TraceContext, data: unknown): Promise<unknown> {
const childTrace = createTrace({
traceId: trace.traceId,
parentSpanId: trace.spanId,
});
logger.info('Calling order service', {
traceId: childTrace.traceId,
spanId: childTrace.spanId,
parentSpanId: childTrace.parentSpanId,
});
const response = await fetch('http://order-service/api/orders', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-trace-id': childTrace.traceId,
'x-span-id': childTrace.spanId,
},
body: JSON.stringify(data),
});
return response.json();
}
// Now you can search all logs for a single traceId
// and reconstruct the entire request flow across servicesThe hardest part of production debugging is reproduction. Here is my systematic approach:
// Step 1: Capture the exact request that failed
// Your structured logging should already include this
/*
{
"level": "error",
"message": "Order processing failed",
"requestId": "abc-123",
"path": "/api/orders",
"method": "POST",
"input": { "itemCount": 3, "totalAmount": 299.97 },
"userId": "user-456",
"errorMessage": "Cannot read properties of null (reading 'address')"
}
*/
// Step 2: Create a reproduction test
describe('Order processing bug - PROD-1234', () => {
it('should handle user with null shipping address', async () => {
// Recreate the exact conditions from production
const user = await createTestUser({
id: 'user-456',
shippingAddress: null, // This is the condition that caused the bug
});
const orderInput = {
userId: user.id,
items: [
{ productId: 'prod-1', qty: 1, price: 99.99 },
{ productId: 'prod-2', qty: 2, price: 99.99 },
],
};
// This should NOT throw — it should return a validation error
const result = await processOrder(orderInput);
expect(result.error).toBe('Shipping address is required');
});
});
// Step 3: Write the fix
async function processOrder(input: OrderInput): Promise<OrderResult> {
const user = await db.users.findUnique({
where: { id: input.userId },
include: { shippingAddress: true },
});
if (!user) {
return { error: 'User not found' };
}
// The fix: check for null address BEFORE using it
if (!user.shippingAddress) {
return { error: 'Shipping address is required' };
}
// Now safe to use user.shippingAddress
const shipping = calculateShipping(user.shippingAddress);
// ...
}Feature flags are not just for gradual rollouts. They are powerful debugging tools:
// Use feature flags to add debug logging in production
// WITHOUT deploying new code
interface FeatureFlags {
debugOrderProcessing: boolean;
debugPaymentFlow: boolean;
verboseLogging: boolean;
slowQueryThreshold: number;
}
async function getFlags(userId?: string): Promise<FeatureFlags> {
// Fetch from your feature flag service (LaunchDarkly, Unleash, etc.)
// or from a simple database/Redis store
const flags = await flagService.getFlags(userId);
return flags;
}
async function processPayment(order: Order): Promise<PaymentResult> {
const flags = await getFlags(order.userId);
if (flags.debugPaymentFlow) {
logger.info('Payment debug: starting', {
orderId: order.id,
amount: order.total,
method: order.paymentMethod,
userId: order.userId,
});
}
const result = await paymentProvider.charge({
amount: order.total,
currency: order.currency,
method: order.paymentMethod,
});
if (flags.debugPaymentFlow) {
logger.info('Payment debug: result', {
orderId: order.id,
success: result.success,
transactionId: result.transactionId,
providerResponse: result.raw, // Only log raw response when debugging
});
}
return result;
}After every significant production incident, write a post-mortem. Not to blame anyone — to learn and prevent recurrence:
## Incident: Order totals incorrect for 2 hours
**Date:** 2026-03-15
**Duration:** 2h 17m (10:43 - 13:00 UTC)
**Severity:** High (affected revenue calculations)
**Detection:** Customer support ticket (not automated — need better monitoring)
### Timeline
- 09:30 - Deploy v2.4.7 (included pricing engine refactor)
- 10:43 - First customer reports incorrect order total
- 11:15 - Engineering begins investigation
- 11:45 - Root cause identified (floating point rounding in new pricing code)
- 12:30 - Fix deployed (v2.4.8)
- 13:00 - Confirmed all orders processing correctly. Affected orders manually corrected.
### Root Cause
The pricing refactor changed `Math.round(price * 100) / 100` to
`parseFloat(price.toFixed(2))`. These produce different results for
certain values (e.g., 1.005 rounds to 1.00 with toFixed but 1.01
with the round approach). 847 orders were affected.
### What Went Well
- Quick root cause identification once engineering engaged
- Manual correction of affected orders within 2 hours
### What Went Wrong
- No automated test for pricing edge cases
- No monitoring alert for unusual order total distributions
- 32-minute gap between first report and engineering engagement
### Action Items
- [ ] Add property-based tests for pricing calculations
- [ ] Add monitoring for order total anomalies (std dev alert)
- [ ] Route "pricing" support tickets directly to engineering Slack
- [ ] Add end-to-end test that processes a known order and verifies the totalI know, I know. "Talk to a rubber duck" sounds like a joke. It is not. It is one of the most reliable debugging techniques ever discovered, and there is actual cognitive science behind why it works.
When you try to explain a problem to someone (or something), you are forced to:
Articulate your assumptions. You cannot explain the system without stating what you think it does. Saying it out loud often reveals that one of those assumptions is wrong.
Linearize your thoughts. Bugs create a tangle of confused, non-linear thinking. Explaining forces you to lay out the steps sequentially. "First X happens, then Y, then Z" — and often at step Y you realize "wait, that cannot be right."
Fill in gaps. In your head, you gloss over the parts you "know." When explaining, you have to make them explicit. The gap you glossed over is frequently where the bug lives.
Before asking for help, write the bug report. The act of writing it will often solve the problem before you send it:
## Bug Report Template
### What I expected to happen:
When a user submits the feedback form with a rating of 1-5 and a comment,
the data should be saved and a confirmation should appear.
### What actually happens:
The form submits successfully (200 response), but the rating is always
saved as 0 regardless of what the user selected.
### Steps to reproduce:
1. Go to /feedback
2. Select 4 stars
3. Type "Great service"
4. Click Submit
5. Check the database: rating is 0
### What I have already tried:
1. Confirmed the form sends the correct value (network tab shows rating: 4)
2. Confirmed the API receives the correct value (server log shows rating: 4)
3. Confirmed the Prisma schema has rating as Int
4. ...wait. The Prisma schema has `rating Int @default(0)`.
Let me check the create call...
### ROOT CAUSE FOUND WHILE WRITING THIS:
The Prisma create call uses `data: { ...req.body }` but req.body.rating
is a string "4" (from form data), and Prisma silently uses the default
value when the type doesn't match instead of throwing.
Fix: `rating: parseInt(req.body.rating, 10)`I am not exaggerating when I say this happens to me at least once a week. The act of writing a precise bug report surfaces the answer. Senior developers know this, which is why they insist on detailed bug reports — not to be bureaucratic, but because writing the report is part of the debugging process.
When you are truly stuck, pair debugging with a colleague is dramatically more effective than solo debugging. But do it right:
The Driver-Navigator Model:
The navigator's most powerful question is: "What are you sure about?" Force the driver to enumerate their certainties. One of those certainties is wrong.
After 90 minutes of stuck debugging, your brain is in a rut. You are looking at the same code, making the same assumptions, trying the same things. The single most effective thing you can do is walk away for 15 minutes.
This is not laziness. This is neuroscience. When you stop actively thinking about a problem, your brain's default mode network takes over. It makes connections between ideas that your focused mode missed. This is why solutions appear in the shower or on a walk. You are not being unproductive — you are letting a different part of your brain take a turn.
I have a personal rule: if I have been stuck for more than an hour, I take a 15-minute walk. I solve more bugs on walks than at my desk.
These are real bugs (with details changed to protect the guilty). Each one taught me something about debugging that no textbook ever could.
A customer reported that their scheduled reports were not being generated. But only sometimes. We checked the logs, checked the cron job, checked the queue — everything looked fine. Reports generated correctly when we tested manually.
After two weeks of intermittent failures and escalating frustration, I noticed the pattern: failures only happened on Tuesdays. Not every Tuesday, but only on Tuesdays. What is special about Tuesdays?
The cron expression was 0 9 * * 2 — run at 9 AM on day 2 of the week. That is Tuesday. The report generation function started by fetching "last week's data":
// The bug
function getLastWeekRange(): { start: Date; end: Date } {
const now = new Date();
const dayOfWeek = now.getDay(); // 0=Sunday, 1=Monday, 2=Tuesday...
// "Start of last week" = go back to the previous Monday
const start = new Date(now);
start.setDate(now.getDate() - dayOfWeek - 6); // BUG HERE
const end = new Date(start);
end.setDate(start.getDate() + 7);
return { start, end };
}The formula now.getDate() - dayOfWeek - 6 was supposed to find the previous Monday. On most days it worked fine. But on Tuesday (dayOfWeek = 2), if the current date was early in the month (say, March 2nd), getDate() - dayOfWeek - 6 would be 2 - 2 - 6 = -6. JavaScript's Date.setDate(-6) does not throw — it goes back to a previous month. The resulting date range was bizarre, querying data from the distant past, returning zero results.
The fix was trivial:
// The fix: use proper date library
import { startOfWeek, subWeeks, endOfWeek } from 'date-fns';
function getLastWeekRange(): { start: Date; end: Date } {
const lastWeek = subWeeks(new Date(), 1);
return {
start: startOfWeek(lastWeek, { weekStartsOn: 1 }), // Monday
end: endOfWeek(lastWeek, { weekStartsOn: 1 }),
};
}Lesson: When a bug is intermittent, look for patterns in WHEN it happens. Time-related bugs love hiding in edge cases: month boundaries, year boundaries, DST transitions, leap years, and specific days of the week.
We had a data pipeline that processed user events and aggregated them into daily summaries. It worked perfectly in development and testing. In production, the summaries were slightly wrong — but only slightly. Revenue numbers were off by fractions of a percent. Nobody noticed for a month.
When we finally caught it (a finance team member noticed quarterly totals did not match), we found the bug:
// The bug: floating point accumulation
interface DailySummary {
totalRevenue: number;
orderCount: number;
}
async function aggregateDaily(events: OrderEvent[]): Promise<DailySummary> {
let totalRevenue = 0;
for (const event of events) {
totalRevenue += event.amount; // Floating point addition accumulates errors
}
return { totalRevenue, orderCount: events.length };
}
// With thousands of additions:
// 0.1 + 0.2 = 0.30000000000000004
// Over 50,000 events per day, the error accumulated to real moneyThe data was not corrupt in the database — the original events were fine. But the aggregation introduced rounding errors that compounded over time. By the time we found it, a month of daily summaries were all slightly wrong.
// The fix: use integer arithmetic for money
async function aggregateDaily(events: OrderEvent[]): Promise<DailySummary> {
let totalCents = 0;
for (const event of events) {
// Store and calculate in cents (integers), display in dollars
totalCents += Math.round(event.amount * 100);
}
return {
totalRevenue: totalCents / 100,
orderCount: events.length,
};
}
// Even better: use a proper decimal library
import { Decimal } from 'decimal.js';
async function aggregateDaily(events: OrderEvent[]): Promise<DailySummary> {
let total = new Decimal(0);
for (const event of events) {
total = total.plus(new Decimal(event.amount));
}
return {
totalRevenue: total.toNumber(),
orderCount: events.length,
};
}Lesson: Some bugs do not crash. They corrupt data slowly, quietly, over time. These are the most expensive bugs because by the time you notice, the damage is extensive. For anything involving money, use integer arithmetic or a decimal library. And have automated checks that validate aggregated data against source data.
Our API started returning 503 errors intermittently. About 5% of requests were failing. The error logs showed connection timeouts to our payment provider's API. We spent two days investigating:
On day three, I noticed that the failures correlated with specific server instances in our load balancer pool. Servers that had been running for more than 24 hours failed. Recently restarted servers worked fine.
The root cause: the payment provider had changed their API endpoint's IP address (DNS record update). Our Node.js processes cached DNS lookups indefinitely by default. Old processes still had the old IP. New processes resolved the new IP.
// The root cause: Node.js caches DNS by default
// When the payment provider changed IPs, old processes kept the stale IP
// Fix 1: Disable DNS caching (not recommended — too aggressive)
// import dns from 'node:dns';
// dns.setDefaultResultOrder('verbatim');
// Fix 2: Set a reasonable DNS cache TTL
import { Agent } from 'node:http';
import { Agent as HttpsAgent } from 'node:https';
// Node.js 20+ supports family autoselection and DNS cache control
const httpsAgent = new HttpsAgent({
keepAlive: true,
// DNS results honored for the TTL the DNS server specifies
// Ensure your HTTP client respects this
});
// Fix 3 (best): Use a DNS-aware HTTP client
// Most HTTP clients (got, axios, undici) handle DNS caching properly
// The built-in fetch in Node.js 21+ also handles this correctly
// Fix 4: Add retry logic for transient DNS issues
async function callPaymentAPI(data: unknown): Promise<unknown> {
const maxRetries = 3;
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await fetch('https://api.payment-provider.com/charge', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(data),
signal: AbortSignal.timeout(5000),
});
return await response.json();
} catch (error) {
if (attempt === maxRetries - 1) throw error;
logger.warn('Payment API call failed, retrying', {
attempt: attempt + 1,
error: error instanceof Error ? error.message : String(error),
});
// Exponential backoff
await new Promise(r => setTimeout(r, Math.pow(2, attempt) * 1000));
}
}
}Lesson: Not every bug is in your code. Infrastructure issues — DNS, TLS certificates, network routing, cloud provider quirks — can masquerade as application bugs. When you have eliminated all code-level causes, expand your investigation to the infrastructure layer.
An e-commerce platform ran a flash sale: 50% off everything, midnight to midnight, March 15th. The sale was supposed to end at midnight Eastern Time. It ended at midnight UTC — 5 hours early on the East Coast. Five hours of full-price orders that customers expected to be discounted.
// The bug
function isSaleActive(saleConfig: SaleConfig): boolean {
const now = new Date();
return now >= saleConfig.startDate && now <= saleConfig.endDate;
}
// saleConfig was created with:
const saleConfig = {
startDate: new Date('2026-03-15'), // Midnight UTC, not midnight ET!
endDate: new Date('2026-03-16'), // Also midnight UTC
};
// new Date('2026-03-15') creates a Date at 2026-03-15T00:00:00.000Z (UTC)
// For US Eastern Time (UTC-5), that's 2026-03-14T19:00:00 (7 PM the day before!)
// The sale started 5 hours early and ended 5 hours early.The fix required thinking explicitly about timezones everywhere:
// The fix: always be explicit about timezones
function isSaleActive(saleConfig: SaleConfig): boolean {
const now = new Date();
return now >= saleConfig.startDate && now <= saleConfig.endDate;
}
// Create sale config with explicit timezone
function createSaleConfig(
startDateStr: string, // '2026-03-15'
endDateStr: string, // '2026-03-16'
timezone: string // 'America/New_York'
): SaleConfig {
// Use Intl to create timezone-aware dates
const startDate = zonedTimeToUtc(startDateStr, timezone);
const endDate = zonedTimeToUtc(endDateStr, timezone);
return { startDate, endDate, timezone };
}
// Helper: convert "midnight in timezone X" to UTC
function zonedTimeToUtc(dateStr: string, timezone: string): Date {
// Create a date string that explicitly includes the timezone offset
const formatter = new Intl.DateTimeFormat('en-US', {
timeZone: timezone,
year: 'numeric',
month: '2-digit',
day: '2-digit',
hour: '2-digit',
minute: '2-digit',
second: '2-digit',
hour12: false,
timeZoneName: 'longOffset',
});
// Parse the timezone offset and create the correct UTC date
const parts = formatter.formatToParts(new Date(`${dateStr}T00:00:00`));
const offset = parts.find(p => p.type === 'timeZoneName')?.value ?? '+00:00';
return new Date(`${dateStr}T00:00:00${offset}`);
}
// RULE: Never use new Date('YYYY-MM-DD') for business logic.
// It creates a UTC date, which is almost never what you want.
// Always specify the timezone explicitly.The company had to honor the sale price for all affected orders. Cost: roughly $50K in margins. All because new Date('2026-03-15') means midnight UTC, not midnight in the business's timezone.
Lesson: Timezone bugs are not edge cases. They are guaranteed to happen if you use new Date() without thinking about timezones. Store all dates in UTC, but convert to the user's or business's timezone for any business logic involving "days" or "midnight." Test with timezones on both sides of UTC.
A Next.js API route was crashing with TypeError: Cannot read properties of null (reading 'id'). The stack trace pointed to a line where we accessed session.user.id. But we had middleware that checks for a valid session. If there is no session, the middleware returns 401 before the handler runs. So how could session.user be null?
Three days of investigation. I added logging everywhere. I checked the middleware order. I checked the session store. I checked the authentication library. Everything was correct. The session existed. The user existed. And yet — null.
On day three, out of desperation, I logged the entire session object:
// What I expected:
// { user: { id: "abc-123", email: "user@example.com" }, expires: "..." }
// What I got:
// { user: { id: "abc-123", email: "user@example.com" }, expires: "..." }
// Looks correct! But wait...
// I logged it differently:
console.log('session type:', typeof session);
console.log('session.user type:', typeof session.user);
console.log('session.user:', JSON.stringify(session.user));
console.log('session.user.id:', session.user.id);
// Output:
// session type: object
// session.user type: object
// session.user: {"id":"abc-123","email":"user@example.com"}
// TypeError: Cannot read properties of null (reading 'id')Wait. session.user is an object that JSON.stringifies correctly, but accessing .id on it throws a null error? That is impossible... unless there are two different .user properties.
The session library was using a Proxy that lazily loaded the user. The JSON.stringify called the proxy's toJSON() method, which loaded the user. But the direct property access hit a stale cached value from a previous failed load. The proxy cached null on first access failure and never retried.
// The actual bug was in a dependency's proxy implementation
// Simplified version of what was happening:
class LazySession {
private _user: User | null | undefined = undefined;
private _userId: string;
constructor(userId: string) {
this._userId = userId;
}
get user(): User | null {
if (this._user === undefined) {
// First access: try to load
try {
this._user = loadUserSync(this._userId); // Could return null on timeout
} catch {
this._user = null; // Cache the failure forever — BUG
}
}
return this._user;
}
toJSON(): object {
// toJSON bypassed the cache and always loaded fresh
return {
user: loadUserSync(this._userId), // This worked because it loaded fresh
expires: this.expires,
};
}
}The fix was one line — do not cache null:
get user(): User | null {
if (this._user === undefined || this._user === null) {
try {
this._user = loadUserSync(this._userId);
} catch {
return null; // Return null but don't cache it
}
}
return this._user;
}Lesson: When the impossible happens — when the data looks correct but the code still fails — question the abstraction layer. Proxies, getters, setters, and lazy loading can create situations where console.log(obj) and direct property access give different results. When in doubt, use Object.getOwnPropertyDescriptor() to understand what a property actually is.
// Debugging trick: check if a property is a plain value or a getter/proxy
const descriptor = Object.getOwnPropertyDescriptor(session, 'user');
console.log('Is getter?', !!descriptor?.get);
console.log('Is value?', 'value' in (descriptor ?? {}));
console.log('Configurable?', descriptor?.configurable);
// Check the prototype chain too
const protoDescriptor = Object.getOwnPropertyDescriptor(
Object.getPrototypeOf(session),
'user'
);
console.log('Proto getter?', !!protoDescriptor?.get);Debugging skill is not innate. It is built through deliberate practice and pattern recognition. Here is how to get better over time.
Every time you fix a non-trivial bug, write down:
// Example bug journal entry (I keep mine in a simple JSON file):
interface BugJournalEntry {
date: string;
title: string;
symptom: string;
redHerrings: string[];
rootCause: string;
technique: string;
timeSpent: string;
fasterNext: string;
}
const entry: BugJournalEntry = {
date: '2026-03-15',
title: 'Orders API returning empty arrays',
symptom: 'GET /api/orders?status=pending returns [] even though there are pending orders in DB',
redHerrings: [
'Checked query builder — looked correct',
'Checked database directly — orders exist',
'Checked permissions middleware — not filtering',
],
rootCause: 'Prisma enum value was "PENDING" but DB had "pending" (case mismatch after migration)',
technique: 'Logged the raw SQL query and ran it manually in psql',
timeSpent: '2 hours',
fasterNext: 'Should have logged the raw SQL immediately instead of trusting the ORM',
};After a few months, you will start seeing patterns. You will realize that 30% of your bugs are related to data type mismatches, or that you waste the most time on bugs caused by caching. That awareness lets you build heuristics: "If the data looks correct but the query returns nothing, check for case/type mismatches first."
The best debuggers have accurate mental models of how their systems work. Not just their own code — the layers beneath it. HTTP, TCP, DNS, the event loop, garbage collection, database query planners, operating system scheduling.
You do not need to know these at the implementor level. You need to know them at the "what can go wrong" level:
Mental model: HTTP request lifecycle
1. DNS resolution (can fail: ENOTFOUND, stale cache)
2. TCP handshake (can fail: ECONNREFUSED, timeout)
3. TLS handshake (can fail: certificate expired, hostname mismatch)
4. Request sent (can fail: EPIPE, connection reset)
5. Server processing (can fail: timeout, 5xx)
6. Response received (can fail: incomplete body, invalid JSON)
7. Connection pooling (can cause: stale connections, pool exhaustion)
Mental model: Node.js event loop
1. Timers (setTimeout, setInterval)
2. Pending callbacks (I/O callbacks)
3. Idle, prepare (internal)
4. Poll (incoming connections, data)
5. Check (setImmediate)
6. Close callbacks (socket.on('close'))
Key insight: If any phase takes too long, ALL subsequent phases are delayed.
A CPU-intensive synchronous operation in a request handler blocks EVERYTHING.
When faced with any bug, work through this hierarchy from top to bottom:
1. READ THE ERROR MESSAGE (fully, carefully)
↓ Still stuck?
2. CHECK WHAT CHANGED (git log, deploys, config changes)
↓ Still stuck?
3. REPRODUCE MINIMALLY (smallest input that triggers the bug)
↓ Still stuck?
4. BINARY SEARCH (bisect the code path)
↓ Still stuck?
5. ADD TARGETED LOGGING (hypothesize, then log to confirm)
↓ Still stuck?
6. USE THE DEBUGGER (breakpoints, step through)
↓ Still stuck?
7. EXPLAIN THE PROBLEM (rubber duck, write a bug report)
↓ Still stuck?
8. TAKE A BREAK (15 minutes, walk, different context)
↓ Still stuck?
9. ASK FOR HELP (pair debug, fresh eyes)
↓ Still stuck?
10. QUESTION YOUR ASSUMPTIONS (the impossible is happening = wrong mental model)
Most bugs are solved at steps 1-3. If you find yourself regularly reaching step 6, your observation skills (step 1) probably need work.
The ultimate debugging skill is not finding bugs faster — it is writing fewer bugs in the first place. Every bug you debug teaches you a pattern to avoid:
// Pattern: Always validate at system boundaries
// After debugging enough "undefined is not an object" errors,
// you learn to validate inputs before using them
function createOrder(input: unknown): Order {
// Validate at the boundary — never trust external input
const parsed = orderSchema.safeParse(input);
if (!parsed.success) {
throw new ValidationError(parsed.error.issues);
}
// From here on, TypeScript guarantees the shape is correct
const { userId, items, shippingAddress } = parsed.data;
// ...
}
// Pattern: Make invalid states unrepresentable
// After debugging enough "how did this get into an impossible state" bugs,
// you learn to use the type system to prevent them
// BAD: Boolean flags create invalid states
interface Order {
isPaid: boolean;
isShipped: boolean;
isCancelled: boolean;
// Can an order be both paid AND cancelled? Shipped but not paid?
// The type allows all 8 combinations. Some are nonsensical.
}
// GOOD: Discriminated union makes invalid states impossible
type Order =
| { status: 'pending'; createdAt: Date }
| { status: 'paid'; createdAt: Date; paidAt: Date }
| { status: 'shipped'; createdAt: Date; paidAt: Date; shippedAt: Date }
| { status: 'cancelled'; createdAt: Date; cancelledAt: Date; reason: string };
// Now TypeScript enforces: shipped orders MUST have a paidAt date.
// Cancelled orders MUST have a reason. You can't create an invalid state.
// Pattern: Fail fast and loud
// After debugging enough "silent data corruption" bugs,
// you learn to throw errors early rather than propagating bad data
function calculateTax(amount: number, rate: number): number {
if (!Number.isFinite(amount) || amount < 0) {
throw new Error(`Invalid amount: ${amount}`);
}
if (!Number.isFinite(rate) || rate < 0 || rate > 1) {
throw new Error(`Invalid tax rate: ${rate}`);
}
return Math.round(amount * rate * 100) / 100;
}
// NaN propagation stops here instead of corrupting downstream dataDebugging is a skill, not a talent. The developers who seem to find bugs magically fast are not geniuses — they have internalized a systematic process and built a library of patterns through experience.
The process is simple: observe precisely, hypothesize specifically, test minimally, and iterate quickly. The tools — debuggers, profilers, heap snapshots, flame graphs — are powerful but secondary to the thinking process. A developer with good mental models and console.log will outperform a developer with bad mental models and every tool in the world.
Start with the scientific method. Keep a bug journal. Build mental models of the systems you work with. Learn to read error messages — really read them, not just glance at them. And when you are stuck, walk away. Your brain is still working on it.
The bugs never stop. But you get faster. You start recognizing patterns — "this looks like a race condition," "this smells like a caching issue," "this has to be a timezone problem." Each bug you solve thoroughly, understanding not just the fix but the root cause and the pattern, makes you faster at the next one.
That is how senior engineers find bugs 10x faster. Not magic. Just compound interest on paying attention.