After dozens of system design interviews — on both sides of the table — a pattern keeps repeating: candidates memorize the "correct" architecture for a URL shortener or a chat application, then crumble the moment a single constraint changes. They have learned answers. They have not learned how to think.

Most system design prep material follows the same formula: here is the problem, here is the solution diagram, memorize it. This approach fails spectacularly in real interviews because interviewers are not testing whether you can regurgitate a reference architecture. They are testing whether you can reason about trade-offs under ambiguity, communicate clearly while exploring a design space, and make defensible decisions with incomplete information.

What follows is the kind of orientation that pays off before a first system design interview — the mental models, the math, the patterns, and the mistakes that separate candidates who get offers from candidates who get polite rejections.

Why Most System Design Prep Is Backwards#

The typical study path looks like this: watch a YouTube video about designing Twitter, read a blog post about designing Uber, memorize the components, and hope the interview asks about something similar. This is like studying for a math exam by memorizing the answers to specific problems instead of understanding the underlying concepts.

System design interviews are open-ended conversations. The interviewer gives you a deliberately vague prompt — "design a notification system" — and watches how you navigate from vagueness to clarity. The skills being evaluated are:

Scoping and clarification. Do you ask what kind of notifications? Push, email, SMS, in-app? What scale? What latency requirements? Or do you just start drawing boxes?

Trade-off analysis. When you choose a database, can you articulate why you chose it over alternatives? When you add a cache, do you acknowledge the consistency implications?

Iterative refinement. Do you start simple and add complexity as needed, or do you immediately jump to a microservices architecture with Kafka, Redis, Elasticsearch, and a service mesh for a system that handles 100 requests per minute?

Operational awareness. How will you deploy this? How will you monitor it? What happens when component X fails? These questions reveal whether you have actually built systems or only read about them.

The prep that actually works is building a toolkit of reusable patterns and understanding when each pattern applies. You do not need to memorize 50 system designs. You need to deeply understand maybe 10 fundamental patterns and be able to compose them on the fly.

The 5 Fundamental Building Blocks#

Every distributed system is composed of the same handful of building blocks. Understanding these deeply — not just what they are, but when they become the bottleneck — is the foundation of system design thinking.

Load Balancers#

A load balancer distributes incoming traffic across multiple servers. Simple concept, but the devil is in the details.

                    +-----------+
  Clients -------->| Load      |-------> Server A
                   | Balancer  |-------> Server B
                   +-----------+-------> Server C

Algorithms that matter:

Round-robin: Simple, works when servers are identical. Falls apart when requests have wildly different processing times.
Least connections: Routes to the server with fewest active connections. Better for uneven workloads.
Consistent hashing: Routes based on a hash of the request (user ID, session ID). Critical when you need session affinity or want to maximize cache hit rates on individual servers.
Weighted: Assigns different weights to servers based on capacity. Useful during rolling deployments or when running heterogeneous hardware.

When the load balancer becomes the bottleneck: At extremely high throughput (millions of requests per second), Layer 7 (HTTP) load balancers become CPU-bound because they need to parse HTTP headers. The solution is either Layer 4 (TCP) load balancing, which is much cheaper, or DNS-based load balancing. In practice, most systems use a two-tier approach: DNS/L4 at the edge, L7 internally.

Caches#

Memory is roughly 100x faster than SSD and 10,000x faster than network round-trips to a database. Caching exploits this hierarchy.

  Request --> Cache HIT? --> Return cached response
                |
               MISS
                |
                v
           Database --> Store in cache --> Return response

When the cache becomes the bottleneck: Cache stampede — when a popular cache key expires and hundreds of concurrent requests all miss the cache simultaneously, hammering the database. Also, memory limits: when your working set exceeds available memory, eviction starts and hit rates drop. I will cover caching strategies in depth later.

Message Queues#

Queues decouple producers from consumers, absorb traffic spikes, and enable asynchronous processing.

  Producer --> [  Queue  ] --> Consumer A
                             --> Consumer B
                             --> Consumer C

When the queue becomes the bottleneck: When consumers cannot keep up with producers for an extended period, the queue grows unboundedly. This manifests as increasing end-to-end latency and, eventually, disk pressure on the broker. The fix is either adding consumers, implementing backpressure, or dropping/dead-lettering messages.

Databases#

The persistent source of truth. The single most important decision in many system designs.

When the database becomes the bottleneck: Almost always reads. Read-heavy workloads can be addressed with read replicas, caching, or denormalization. Write-heavy workloads require sharding, write-behind patterns, or choosing a database designed for write throughput.

CDNs#

A CDN is a geographically distributed cache for static and semi-static content.

  User (Tokyo) --> CDN Edge (Tokyo) --> HIT --> Response (5ms)
                                         |
                                        MISS
                                         |
                                         v
                                   Origin (US-East) --> Response (200ms) + cache at edge

When the CDN becomes the bottleneck: When your content is too dynamic to cache, when cache invalidation latency is unacceptable, or when you are serving personalized content to every user. At that point, you are paying CDN costs without getting CDN benefits.

Back-of-the-Envelope Math That Actually Matters#

Interviewers do not expect you to compute exact numbers. They expect you to demonstrate that you can estimate whether a design is feasible. Here are the calculations that come up repeatedly.

QPS Estimation#

Start with daily active users (DAU), estimate actions per user per day, and convert to requests per second.

QPS = (DAU * actions_per_user_per_day) / 86400

Example: Twitter-like service
- 200M DAU
- Average user views 50 tweets/day, posts 2 tweets/day
- Read QPS: (200M * 50) / 86400 ≈ 115,000 read QPS
- Write QPS: (200M * 2) / 86400 ≈ 4,600 write QPS
- Peak QPS: ~2-3x average ≈ 230,000-345,000 read QPS

These numbers immediately tell you: this is a read-heavy system (25:1 ratio), you need serious caching and read replicas, and a single database will not handle this.

Storage Estimation#

Estimate the size of a single record, multiply by the number of records over your time horizon (usually 5 years).

Example: URL shortener
- 100M new URLs per month
- Each record: short_url (7 bytes) + long_url (avg 200 bytes) + metadata (50 bytes) ≈ 257 bytes
- 5 years: 100M * 12 * 5 * 257 bytes ≈ 1.54 TB

That fits comfortably on a single machine. No need for sharding yet.

Bandwidth Estimation#

Bandwidth = QPS * average_response_size

Example: Image serving
- 50,000 QPS for images
- Average image size: 500 KB
- Bandwidth: 50,000 * 500 KB = 25 GB/s

That is a LOT of bandwidth. You absolutely need a CDN.

The Numbers You Should Have Memorized#

These are approximate and vary by hardware, but they give you the right order of magnitude:

L1 cache reference:                    1 ns
L2 cache reference:                    4 ns
Main memory reference:                100 ns
SSD random read:                   16,000 ns  (16 us)
HDD random read:              2,000,000 ns    (2 ms)
Round trip within same datacenter:   500,000 ns (0.5 ms)
Round trip US coast to coast:   40,000,000 ns   (40 ms)

Single machine throughput:
- Redis: ~100,000 ops/sec
- PostgreSQL: ~10,000 queries/sec (simple queries)
- Kafka: ~1,000,000 messages/sec (per broker, small messages)

The Consistency Spectrum#

Most candidates describe consistency as a binary: either you have it or you do not. Reality is a spectrum, and understanding where your system falls on that spectrum is critical.

Strong Consistency#

After a write completes, every subsequent read (from any node) returns that write. This is what you get with a single database, and what you give up the moment you replicate.

Use when: Correctness matters more than availability. Banking transactions, inventory counts, anything involving money or legal compliance.

Client A: WRITE balance = $500       [PRIMARY]
Client B: READ balance -----------> [REPLICA]
                                     Must return $500

With strong consistency, the system blocks Client B's read
until the replica has received and applied the write.
This adds latency but guarantees correctness.

The cost of strong consistency is latency and availability. If the replica is unreachable, the system must either block (reducing availability) or return an error.

Eventual Consistency#

After a write, replicas will eventually converge to the same value, but reads in the interim may return stale data.

Use when: Availability and latency matter more than immediate consistency. Social media feeds, like counts, view counters, recommendation systems.

Client A: WRITE new_post            [PRIMARY]
Client B: READ feed   -----------> [REPLICA]
                                    Might not include new_post yet
                                    Will include it eventually (usually ms to seconds)

"Eventually" is doing a lot of work in that sentence. In well-configured systems, "eventually" means milliseconds. In poorly configured systems, it can mean minutes or even hours during network partitions.

Causal Consistency#

Preserves the ordering of causally related operations while allowing concurrent operations to be seen in different orders.

Use when: You need something stronger than eventual consistency but cannot afford strong consistency. Collaborative editing, comment threads, messaging systems.

Alice: POST "Anyone want lunch?"           (timestamp T1)
Bob:   POST "Sure, where?"                 (timestamp T2, caused by T1)
Carol: READ feed
       Must see Alice's post before Bob's  (causal ordering)
       May not see either yet              (eventual delivery)

This is the sweet spot for many real-world systems. Google Docs uses a variant of causal consistency (via operational transforms / CRDTs) to allow concurrent editing without locks while preserving the "intent" of edits.

Database Selection Framework#

"Which database should I use?" is the single most common question in system design interviews. Here is a framework for reasoning about it instead of memorizing a chart.

Start With Your Access Patterns#

The right database depends on how you will read and write data, not on what kind of data you have. Ask these questions:

What are your read patterns? Point lookups by key? Range queries? Full-text search? Graph traversals? Aggregations?
What are your write patterns? Single-row inserts? Bulk loads? Append-only? Update-heavy?
What are your consistency requirements? Transactions? Multi-row atomicity? Eventual consistency is fine?
What is your scale? Millions of rows or billions? Megabytes per second or gigabytes?

Relational Databases (PostgreSQL, MySQL)#

Choose when: You have structured data with relationships, need ACID transactions, your access patterns involve joins and complex queries, and your dataset fits on a single machine or a small number of replicas.

Avoid when: Your data has no fixed schema, you need horizontal write scaling beyond what a single master can handle, or your access patterns are simple key-value lookups (you are paying for SQL parsing overhead you do not need).

sql

-- Relational excels at this: complex queries with joins
SELECT u.name, COUNT(o.id) as order_count, SUM(o.total) as lifetime_value
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at > NOW() - INTERVAL '30 days'
GROUP BY u.id
HAVING COUNT(o.id) > 5
ORDER BY lifetime_value DESC
LIMIT 100;

Document Stores (MongoDB, DynamoDB)#

Choose when: Your data is naturally hierarchical (user profiles with nested preferences, product catalogs with variable attributes), you want schema flexibility, or your access pattern is "fetch entire document by key."

Avoid when: You need complex joins, multi-document transactions are central to your business logic, or you have highly relational data.

json

// Document stores excel at this: nested, variable-schema data
{
  "_id": "product_12345",
  "name": "Wireless Headphones",
  "category": "electronics",
  "attributes": {
    "battery_life": "40 hours",
    "noise_cancellation": true,
    "driver_size": "40mm",
    "bluetooth_version": "5.3"
  },
  "reviews": [
    { "user": "alice", "rating": 5, "text": "Great sound" },
    { "user": "bob", "rating": 4, "text": "Comfortable fit" }
  ],
  "pricing": {
    "base": 199.99,
    "sale": 159.99,
    "currency": "USD"
  }
}

Wide-Column Stores (Cassandra, ScyllaDB, HBase)#

Choose when: You need massive write throughput, horizontal scaling across data centers, and your queries are known in advance (you model your tables around your queries). Time-series data, event logs, IoT sensor data.

Avoid when: You need ad-hoc queries, your access patterns are unpredictable, or you need strong consistency across partitions.

Graph Databases (Neo4j, Amazon Neptune)#

Choose when: Relationships ARE the data. Social networks, recommendation engines, fraud detection, knowledge graphs. Any problem where you need to traverse relationships of variable depth.

Avoid when: Your relationships are simple (one or two levels of joins in SQL). Graph databases add operational complexity; do not use them just because your data "has relationships."

Time-Series Databases (InfluxDB, TimescaleDB, ClickHouse)#

Choose when: Your data is append-only with a timestamp, queries are time-range based, and you need efficient aggregation over time windows. Monitoring, IoT, financial market data.

Avoid when: Your data is not fundamentally time-ordered, or you need complex relational queries alongside time-series data (consider TimescaleDB, which is PostgreSQL-based).

The Decision Flowchart#

Need ACID transactions + complex queries?
  YES --> PostgreSQL / MySQL
  NO  --> continue

Data is hierarchical with variable schema?
  YES --> MongoDB / DynamoDB
  NO  --> continue

Need massive write throughput + horizontal scaling?
  YES --> Cassandra / ScyllaDB
  NO  --> continue

Traversing deep relationships is the core query?
  YES --> Neo4j / Neptune
  NO  --> continue

Append-only, time-stamped, aggregation queries?
  YES --> InfluxDB / TimescaleDB / ClickHouse
  NO  --> continue

Simple key-value lookups at extreme scale?
  YES --> Redis / DynamoDB / Memcached
  NO  --> Just use PostgreSQL. Seriously.

That last line is not a joke. PostgreSQL handles an enormous range of workloads well. When in doubt, start with PostgreSQL and specialize later when you hit a concrete bottleneck.

Caching Strategies Deep Dive#

Caching is conceptually simple and operationally nightmarish. Here are the four core patterns and when each makes sense.

Cache-Aside (Lazy Loading)#

The application manages the cache explicitly. On read: check cache, if miss then query database and populate cache. On write: update database, then invalidate (or update) cache.

typescript

async function getUser(userId: string): Promise<User> {
  // 1. Check cache
  const cached = await redis.get(`user:${userId}`);
  if (cached) {
    return JSON.parse(cached);
  }
 
  // 2. Cache miss - query database
  const user = await db.query("SELECT * FROM users WHERE id = $1", [userId]);
 
  // 3. Populate cache with TTL
  await redis.setex(`user:${userId}`, 3600, JSON.stringify(user));
 
  return user;
}
 
async function updateUser(userId: string, data: Partial<User>): Promise<void> {
  // 1. Update database (source of truth)
  await db.query("UPDATE users SET ... WHERE id = $1", [userId, ...]);
 
  // 2. Invalidate cache (NOT update - avoids race conditions)
  await redis.del(`user:${userId}`);
}

Pros: Only caches data that is actually requested. Cache failure does not block reads (just slower). Simple to implement.

Cons: Cache miss penalty (cold start). Possible stale data between database write and cache invalidation. N+1 problem if you are not careful.

Write-Through#

Every write goes to both the cache and the database synchronously. Reads always hit the cache.

typescript

async function updateUser(userId: string, data: Partial<User>): Promise<void> {
  // Write to database AND cache atomically (or as close as possible)
  const updatedUser = await db.query(
    "UPDATE users SET ... RETURNING *",
    [userId, ...]
  );
 
  await redis.setex(`user:${userId}`, 3600, JSON.stringify(updatedUser));
}

Pros: Cache is always up to date. Reads are always fast.

Cons: Write latency increases (two writes per operation). Caches data that may never be read. If the cache write fails after the database write succeeds, you have an inconsistency window.

Write-Behind (Write-Back)#

Writes go to the cache immediately, and the cache asynchronously flushes to the database in batches.

Write Request --> Cache (immediate) --> Background flush --> Database
                                        (batched, async)

Pros: Extremely fast writes. Can batch database writes for efficiency. Great for write-heavy workloads.

Cons: Risk of data loss if the cache crashes before flushing. Complexity of managing the write queue. Reads from the database will return stale data. This pattern terrifies me in most contexts, but it is the right choice for things like analytics counters or rate limit tracking.

Read-Through#

The cache itself is responsible for loading data from the database on a miss. The application only talks to the cache, never directly to the database for reads.

Application --> Cache --> HIT? Return data
                      --> MISS? Cache loads from DB, caches, returns

Pros: Clean application code (single data access layer). Cache handles its own population.

Cons: Requires a cache that supports read-through natively (or a wrapper). First request for each key is slow. Less flexibility in how data is loaded.

The Cache Invalidation Nightmare#

Cache invalidation is where every caching strategy eventually breaks down. Here are the problems and partial solutions:

Problem 1: Race conditions during invalidation.

Thread A: UPDATE user SET name='Alice'   (DB)
Thread B: UPDATE user SET name='Bob'     (DB)
Thread A: SET cache user = 'Alice'       (Cache)
Thread B: SET cache user = 'Bob'         (Cache)

Cache has 'Bob', which is correct. But what if the cache writes are reordered?

Thread A: UPDATE user SET name='Alice'   (DB)
Thread B: UPDATE user SET name='Bob'     (DB)
Thread B: SET cache user = 'Bob'         (Cache)
Thread A: SET cache user = 'Alice'       (Cache)  <-- STALE!

Cache now has 'Alice' but DB has 'Bob'.

Solution: Invalidate (delete) instead of updating the cache. Let the next read repopulate it. This eliminates the reordering problem at the cost of one extra cache miss.

Problem 2: Cache stampede.

A popular key expires. 1,000 concurrent requests all see a cache miss simultaneously. All 1,000 hit the database.

Solution: Lock-based repopulation (only one request queries the database, others wait for the cache to be populated), probabilistic early expiration (refresh the cache before it actually expires, with some randomness to avoid synchronized refreshes), or stale-while-revalidate (serve the expired value while refreshing in the background).

typescript

async function getWithStampedeProtection(key: string): Promise<string | null> {
  const cached = await redis.get(key);
  if (cached) return cached;
 
  // Try to acquire a lock
  const lockKey = `lock:${key}`;
  const acquired = await redis.set(lockKey, "1", "EX", 5, "NX");
 
  if (acquired) {
    // We got the lock - fetch from DB and populate cache
    const value = await fetchFromDatabase(key);
    await redis.setex(key, 3600, value);
    await redis.del(lockKey);
    return value;
  } else {
    // Another request is populating the cache - wait and retry
    await sleep(50);
    return getWithStampedeProtection(key);
  }
}

Message Queues and Event-Driven Architecture#

Message queues are the duct tape of distributed systems. They solve problems of coupling, reliability, and scale — and introduce problems of ordering, idempotency, and operational complexity.

When to Use a Queue#

Ask yourself: does this work need to happen synchronously in the request path?

If a user signs up and you need to send a welcome email, update analytics, and provision resources — none of that needs to block the HTTP response. Put it on a queue.

HTTP Request --> API Server --> Response (200 OK, fast)
                    |
                    v
               Message Queue
                    |
          +---------+---------+
          |         |         |
     Send Email  Analytics  Provision

Kafka vs RabbitMQ vs SQS: Real Trade-offs#

These are not interchangeable. They solve fundamentally different problems.

Apache Kafka is a distributed commit log. Messages are persisted, ordered within partitions, and can be replayed. Consumers track their own offset.

Use Kafka when:
- You need event replay (reprocessing historical events)
- You need strict ordering within a partition
- Multiple consumers need the same events (fan-out)
- You are building event sourcing or CQRS
- Throughput matters more than latency (batching is core to Kafka)

Do NOT use Kafka when:
- You have simple task queues (it is massive overkill)
- You need per-message routing/filtering (Kafka's model is partitions)
- You cannot operate a distributed system (Kafka clusters need care)
- Message volume is low (< 1000/sec, use something simpler)

RabbitMQ is a traditional message broker. It supports complex routing (topic exchanges, headers exchanges, fan-out), per-message acknowledgment, and priority queues.

Use RabbitMQ when:
- You need complex routing (message A goes to queue X, message B to queue Y)
- You need per-message acknowledgment and redelivery
- You need priority queues
- Your message patterns are request/reply or work distribution
- You want something battle-tested and relatively simple to operate

Do NOT use RabbitMQ when:
- You need message replay (consumed messages are gone)
- You need extreme throughput (Kafka wins here by an order of magnitude)
- You need multi-datacenter replication (possible but painful)

Amazon SQS is a managed queue service. No cluster to operate, no broker to monitor, auto-scales to any throughput.

Use SQS when:
- You want zero operational overhead
- You are already on AWS
- Your queuing needs are straightforward (FIFO or standard)
- You do not need complex routing or message replay

Do NOT use SQS when:
- You need strict ordering across all messages (FIFO queues have throughput limits)
- You need message replay
- You want to avoid AWS lock-in
- You need sub-second latency (SQS long-polling adds latency)

The Idempotency Imperative#

With any queue, messages can be delivered more than once. Network timeouts, consumer crashes, and broker retries all cause duplicate delivery. Your consumers MUST be idempotent.

typescript

async function processPayment(message: PaymentMessage): Promise<void> {
  const idempotencyKey = message.paymentId;
 
  // Check if we already processed this payment
  const existing = await db.query("SELECT id FROM processed_payments WHERE payment_id = $1", [idempotencyKey]);
 
  if (existing.rows.length > 0) {
    // Already processed - acknowledge and skip
    console.log(`Payment ${idempotencyKey} already processed, skipping`);
    return;
  }
 
  // Process the payment
  await db.query("BEGIN");
  try {
    await db.query("INSERT INTO payments ...", [
      /* payment data */
    ]);
    await db.query("INSERT INTO processed_payments (payment_id) VALUES ($1)", [idempotencyKey]);
    await db.query("COMMIT");
  } catch (error) {
    await db.query("ROLLBACK");
    throw error; // Message will be redelivered
  }
}

Rate Limiting and Backpressure Patterns#

Every system that faces the internet needs rate limiting. Every system that depends on downstream services needs backpressure. They are related but different.

Rate Limiting#

Rate limiting protects your system from being overwhelmed, whether by legitimate traffic spikes, misbehaving clients, or denial-of-service attacks.

Token bucket is the most common algorithm. Each user has a bucket of tokens that refills at a constant rate. Each request consumes a token. When the bucket is empty, requests are rejected.

typescript

async function checkRateLimit(
  userId: string,
  maxTokens: number,
  refillRate: number, // tokens per second
): Promise<boolean> {
  const key = `ratelimit:${userId}`;
  const now = Date.now();
 
  // Lua script for atomic token bucket
  const script = `
    local key = KEYS[1]
    local max_tokens = tonumber(ARGV[1])
    local refill_rate = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])
 
    local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(bucket[1]) or max_tokens
    local last_refill = tonumber(bucket[2]) or now
 
    -- Refill tokens based on elapsed time
    local elapsed = (now - last_refill) / 1000
    tokens = math.min(max_tokens, tokens + (elapsed * refill_rate))
 
    if tokens >= 1 then
      tokens = tokens - 1
      redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
      redis.call('EXPIRE', key, 3600)
      return 1
    else
      redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
      redis.call('EXPIRE', key, 3600)
      return 0
    end
  `;
 
  const result = await redis.eval(script, 1, key, maxTokens, refillRate, now);
  return result === 1;
}

Sliding window is better for APIs where you want to express limits as "100 requests per minute" and have them apply smoothly rather than resetting at fixed intervals.

Backpressure#

Backpressure is the system telling upstream "slow down, I cannot keep up." Without backpressure, queues grow without bound, memory fills up, and eventually something crashes.

Without backpressure:
  Producer (fast) --> Queue (growing) --> Consumer (slow)
                      ^^^^^^^^^^^^^^^^
                      Eventually: OOM, disk full, crash

With backpressure:
  Producer (fast) --> Queue (bounded) --> Consumer (slow)
                      |
                      Queue full? --> Reject / slow producer / drop oldest

Backpressure strategies include bounded queues that reject when full, reactive streams that signal demand upstream, circuit breakers that stop calling failing services, and load shedding that deliberately drops low-priority requests to protect high-priority ones.

The circuit breaker pattern deserves special attention:

typescript

class CircuitBreaker {
  private failures: number = 0;
  private lastFailure: number = 0;
  private state: "closed" | "open" | "half-open" = "closed";
 
  constructor(
    private threshold: number = 5,
    private resetTimeout: number = 30000,
  ) {}
 
  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === "open") {
      if (Date.now() - this.lastFailure > this.resetTimeout) {
        this.state = "half-open";
      } else {
        throw new Error("Circuit breaker is open");
      }
    }
 
    try {
      const result = await fn();
      if (this.state === "half-open") {
        this.state = "closed";
        this.failures = 0;
      }
      return result;
    } catch (error) {
      this.failures++;
      this.lastFailure = Date.now();
      if (this.failures >= this.threshold) {
        this.state = "open";
      }
      throw error;
    }
  }
}

The CAP Theorem Is Oversimplified#

You have heard it a hundred times: in a distributed system, you can only have two of three properties: Consistency, Availability, and Partition tolerance. Since network partitions are inevitable, you must choose between CP (consistent but unavailable during partitions) and AP (available but inconsistent during partitions).

This is technically true but practically misleading for several reasons.

First, the "choice" is not system-wide. Different parts of your system can make different trade-offs. Your user authentication can be CP (you absolutely need consistency for password changes) while your news feed can be AP (showing a slightly stale feed is fine).

Second, CAP says nothing about what happens when there is no partition, which is the normal state of affairs. A system can be both consistent and available when the network is healthy — the question is what it sacrifices during the rare partition events.

PACELC: A Better Mental Model#

PACELC extends CAP by asking: when there is no Partition, what do you trade off between Latency and Consistency?

IF Partition:
  Choose between Availability and Consistency (same as CAP)
ELSE (normal operation):
  Choose between Latency and Consistency

This is much more useful because you are making the latency/consistency trade-off constantly, not just during rare partition events.

System         | During Partition    | Normal Operation
---------------|--------------------|-----------------
DynamoDB       | AP (available)     | EL (low latency)
Cassandra      | AP (available)     | EL (low latency)
MongoDB        | CP (consistent)    | EC (consistent)
PostgreSQL     | CP (consistent)    | EC (consistent)
Cosmos DB      | Configurable       | Configurable

DynamoDB and Cassandra sacrifice consistency for both availability during partitions AND low latency during normal operations. PostgreSQL and MongoDB sacrifice availability during partitions AND accept higher latency during normal operations (synchronous replication). This maps much more closely to the actual decisions you make when designing systems.

Real Design Walkthrough: URL Shortener#

Let me walk through designing a URL shortener to demonstrate how these patterns compose. But instead of the typical "draw these boxes" walkthrough, I will focus on the edge cases and decisions that matter.

Step 1: Clarify Requirements#

Before drawing a single box, I would ask:

What is the expected scale? (100 URLs/day vs 100M URLs/day are completely different designs)
What is the read:write ratio? (URL shorteners are extremely read-heavy, maybe 100:1)
Do short URLs expire? (Affects storage growth and cleanup complexity)
Do we need analytics? (Click tracking, geographic data, referrer information)
Custom short URLs? (vanity URLs like bit.ly/my-brand)
What is the required latency for redirects? (This is the critical path)

Let me assume: 100M new URLs per month, 10B redirects per month, URLs expire after 5 years, basic analytics (click count), no custom URLs for now.

Step 2: Back-of-the-Envelope#

Write QPS: 100M / (30 * 86400) ≈ 40 writes/sec (easy)
Read QPS:  10B / (30 * 86400) ≈ 3,800 reads/sec (moderate)
Peak read: ~10,000 reads/sec (manageable)

Storage:
- 100M * 12 months * 5 years = 6B records
- Each record: short_code (7B) + URL (200B) + metadata (50B) ≈ 257B
- Total: 6B * 257B ≈ 1.5 TB (fits on one machine, but sharding for availability)

Short URL length:
- 7 characters using [a-zA-Z0-9] = 62^7 ≈ 3.5 trillion combinations
- With 6B URLs, collision probability is negligible

Step 3: Key Generation#

This is where most candidates stumble. How do you generate unique, short, URL-safe strings?

Option A: Hash the URL, take first 7 characters.

Problem: collisions. MD5 or SHA256 have uniform distribution, but truncating to 7 chars increases collision probability significantly. You must handle collisions (check database, regenerate).

Option B: Pre-generate keys.

Generate all possible 7-character strings, store them in a database, and pop one off when needed. Eliminates collision handling but requires managing a key pool.

Option C: Counter-based with base62 encoding.

Use a distributed counter (or range-based allocation) and encode the counter value in base62.

typescript

function encodeBase62(num: number): string {
  const chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
  let result = "";
  while (num > 0) {
    result = chars[num % 62] + result;
    num = Math.floor(num / 62);
  }
  return result.padStart(7, "0");
}
 
// Counter 1 -> "0000001"
// Counter 1000000 -> "0004c92"
// Counter 3500000000000 -> "zzzzzzy" (approaching limit)

A solid choice is Option C using range-based allocation: each application server requests a range of IDs (say 10,000 at a time) from a coordination service (could be a simple database sequence). This avoids hot-spotting the counter and eliminates the need for collision checking.

Step 4: The Architecture#

                              +-------------+
  Client --> CDN/LB -------->| API Servers  |
                              | (stateless) |
                              +------+------+
                                     |
                          +----------+----------+
                          |                     |
                     +----+----+          +-----+-----+
                     |  Redis  |          | PostgreSQL |
                     |  Cache  |          |  (sharded) |
                     +---------+          +-----------+

Read path (redirect):

Client requests GET /abc1234
Check Redis cache for abc1234 -> if HIT, 301 redirect (sub-millisecond)
If MISS, query PostgreSQL, cache the result, 301 redirect
Asynchronously increment click counter (do NOT block the redirect)

Write path (create short URL):

Client sends POST /shorten with the long URL
Server allocates next ID from its pre-fetched range
Encode ID as base62 string
Insert into PostgreSQL
Optionally pre-warm the cache
Return the short URL

Step 5: The Edge Cases Nobody Talks About#

301 vs 302 redirect. A 301 (permanent redirect) means browsers will cache the redirect and never hit your server again. Great for performance, terrible for analytics. A 302 (temporary redirect) forces every click through your server. Most URL shorteners use 302 or 307 despite the performance cost, because analytics data is the product.

Hot URLs. A single URL that goes viral can receive millions of hits per second. Your cache must handle this gracefully. Since the data is immutable (the mapping never changes), you can cache aggressively with long TTLs and even replicate hot keys across multiple cache shards.

Abuse prevention. URL shorteners are frequently used for phishing and malware distribution. You need URL scanning before accepting a new URL, rate limiting on URL creation, and a reporting/takedown mechanism.

Database sharding strategy. Shard by the short code (hash-based sharding). This distributes load evenly and ensures each redirect query hits exactly one shard. Do not shard by user ID unless analytics queries by user are the primary access pattern.

Common Mistakes That Tank System Design Interviews#

After interviewing many candidates, these are the patterns I see most often in failed interviews.

Over-Engineering From the Start#

Candidate immediately introduces microservices, Kubernetes, Kafka, Redis, Elasticsearch, a service mesh, and a data lake for a system that handles 100 requests per minute. Start simple. A monolith with a PostgreSQL database handles an enormous range of workloads. Add complexity only when you can articulate why the simpler solution fails.

A good interviewer will ask "what happens when we need to scale 100x?" That is your opportunity to evolve the design. But starting with a distributed system architecture for a low-scale problem signals that you do not understand the cost of complexity.

Not Asking Clarifying Questions#

The prompt "design a chat application" is deliberately vague. Is it 1:1 or group chat? How many users? Do messages need to be persistent? Are there media attachments? Is there end-to-end encryption? What about message delivery guarantees (read receipts)?

Jumping straight into the design without scoping is the single biggest red flag. It suggests you build solutions before understanding problems.

Ignoring Operational Concerns#

Your beautiful architecture is worthless if you cannot deploy, monitor, and debug it. Good candidates discuss:

Monitoring: What metrics will you track? How will you know when the system is degrading before users complain?
Deployment: How do you deploy without downtime? What is your rollback strategy?
Failure modes: What happens when the cache goes down? When a database shard is unavailable? When a third-party API is slow?
Data migration: How do you add a new database column without downtime?

Treating Components as Magic Boxes#

Drawing a box labeled "Load Balancer" is not enough. What kind of load balancer? What algorithm? Is it layer 4 or layer 7? What are the failure modes? Similarly, "add a cache" is not a strategy — what caching pattern, what eviction policy, what consistency guarantees?

Not Considering Cost#

In the real world, every architectural decision has a dollar cost. Running a Kafka cluster with 3 brokers and 100 partitions costs real money. If your message volume is 50 messages per second, SQS at $0.40 per million messages would cost you about $1.50/month. That Kafka cluster costs hundreds of dollars per month in infrastructure. The interviewer wants to see that you can make pragmatic decisions, not gold-plated ones.

Putting It All Together#

System design is not about memorizing solutions. It is about building a repertoire of patterns and knowing when to apply them. Every design interview is fundamentally the same conversation:

Understand the problem (clarify requirements, establish constraints)
Estimate the scale (back-of-the-envelope math)
Design the high-level architecture (start simple, use the fundamental building blocks)
Deep-dive into critical components (database selection, caching strategy, consistency model)
Address edge cases and failure modes (what happens when things go wrong)
Discuss trade-offs (why this choice and not that one)

The candidates who stand out are not the ones who draw the most complex diagrams. They are the ones who make deliberate, well-reasoned decisions and can articulate the trade-offs. They say things like "I chose eventual consistency here because this data is read-heavy and staleness of a few seconds is acceptable — if we needed strong consistency, we would use synchronous replication at the cost of higher write latency."

That sentence demonstrates more system design skill than any memorized architecture diagram ever could.

The best system designers I know share one trait: they are deeply uncomfortable with complexity. They add components reluctantly, only when the simpler approach demonstrably fails. They know that every box on the diagram is a potential failure point, a monitoring burden, and a maintenance cost. Simplicity is not a sign of inexperience — it is a sign of someone who has been paged at 3 AM because of a cascading failure in an over-engineered system and has the scars to prove it.

Why Most System Design Prep Is Backwards#

Scoping and clarification. Do you ask what kind of notifications? Push, email, SMS, in-app? What scale? What latency requirements? Or do you just start drawing boxes?

Trade-off analysis. When you choose a database, can you articulate why you chose it over alternatives? When you add a cache, do you acknowledge the consistency implications?

The 5 Fundamental Building Blocks#

Load Balancers#

A load balancer distributes incoming traffic across multiple servers. Simple concept, but the devil is in the details.

                    +-----------+
  Clients -------->| Load      |-------> Server A
                   | Balancer  |-------> Server B
                   +-----------+-------> Server C

Algorithms that matter:

Round-robin: Simple, works when servers are identical. Falls apart when requests have wildly different processing times.
Least connections: Routes to the server with fewest active connections. Better for uneven workloads.
Consistent hashing: Routes based on a hash of the request (user ID, session ID). Critical when you need session affinity or want to maximize cache hit rates on individual servers.
Weighted: Assigns different weights to servers based on capacity. Useful during rolling deployments or when running heterogeneous hardware.

Caches#

Memory is roughly 100x faster than SSD and 10,000x faster than network round-trips to a database. Caching exploits this hierarchy.

  Request --> Cache HIT? --> Return cached response
                |
               MISS
                |
                v
           Database --> Store in cache --> Return response

Message Queues#

Queues decouple producers from consumers, absorb traffic spikes, and enable asynchronous processing.

  Producer --> [  Queue  ] --> Consumer A
                             --> Consumer B
                             --> Consumer C

Databases#

The persistent source of truth. The single most important decision in many system designs.

CDNs#

A CDN is a geographically distributed cache for static and semi-static content.

  User (Tokyo) --> CDN Edge (Tokyo) --> HIT --> Response (5ms)
                                         |
                                        MISS
                                         |
                                         v
                                   Origin (US-East) --> Response (200ms) + cache at edge

Back-of-the-Envelope Math That Actually Matters#

Interviewers do not expect you to compute exact numbers. They expect you to demonstrate that you can estimate whether a design is feasible. Here are the calculations that come up repeatedly.

QPS Estimation#

Start with daily active users (DAU), estimate actions per user per day, and convert to requests per second.

QPS = (DAU * actions_per_user_per_day) / 86400

Example: Twitter-like service
- 200M DAU
- Average user views 50 tweets/day, posts 2 tweets/day
- Read QPS: (200M * 50) / 86400 ≈ 115,000 read QPS
- Write QPS: (200M * 2) / 86400 ≈ 4,600 write QPS
- Peak QPS: ~2-3x average ≈ 230,000-345,000 read QPS

These numbers immediately tell you: this is a read-heavy system (25:1 ratio), you need serious caching and read replicas, and a single database will not handle this.

Storage Estimation#

Estimate the size of a single record, multiply by the number of records over your time horizon (usually 5 years).

Example: URL shortener
- 100M new URLs per month
- Each record: short_url (7 bytes) + long_url (avg 200 bytes) + metadata (50 bytes) ≈ 257 bytes
- 5 years: 100M * 12 * 5 * 257 bytes ≈ 1.54 TB

That fits comfortably on a single machine. No need for sharding yet.

Bandwidth Estimation#

Bandwidth = QPS * average_response_size

Example: Image serving
- 50,000 QPS for images
- Average image size: 500 KB
- Bandwidth: 50,000 * 500 KB = 25 GB/s

That is a LOT of bandwidth. You absolutely need a CDN.

The Numbers You Should Have Memorized#

These are approximate and vary by hardware, but they give you the right order of magnitude:

L1 cache reference:                    1 ns
L2 cache reference:                    4 ns
Main memory reference:                100 ns
SSD random read:                   16,000 ns  (16 us)
HDD random read:              2,000,000 ns    (2 ms)
Round trip within same datacenter:   500,000 ns (0.5 ms)
Round trip US coast to coast:   40,000,000 ns   (40 ms)

Single machine throughput:
- Redis: ~100,000 ops/sec
- PostgreSQL: ~10,000 queries/sec (simple queries)
- Kafka: ~1,000,000 messages/sec (per broker, small messages)

The Consistency Spectrum#

Most candidates describe consistency as a binary: either you have it or you do not. Reality is a spectrum, and understanding where your system falls on that spectrum is critical.

Strong Consistency#

After a write completes, every subsequent read (from any node) returns that write. This is what you get with a single database, and what you give up the moment you replicate.

Use when: Correctness matters more than availability. Banking transactions, inventory counts, anything involving money or legal compliance.

Client A: WRITE balance = $500       [PRIMARY]
Client B: READ balance -----------> [REPLICA]
                                     Must return $500

With strong consistency, the system blocks Client B's read
until the replica has received and applied the write.
This adds latency but guarantees correctness.

The cost of strong consistency is latency and availability. If the replica is unreachable, the system must either block (reducing availability) or return an error.

Eventual Consistency#

After a write, replicas will eventually converge to the same value, but reads in the interim may return stale data.

Use when: Availability and latency matter more than immediate consistency. Social media feeds, like counts, view counters, recommendation systems.

Client A: WRITE new_post            [PRIMARY]
Client B: READ feed   -----------> [REPLICA]
                                    Might not include new_post yet
                                    Will include it eventually (usually ms to seconds)

Causal Consistency#

Preserves the ordering of causally related operations while allowing concurrent operations to be seen in different orders.

Use when: You need something stronger than eventual consistency but cannot afford strong consistency. Collaborative editing, comment threads, messaging systems.

Alice: POST "Anyone want lunch?"           (timestamp T1)
Bob:   POST "Sure, where?"                 (timestamp T2, caused by T1)
Carol: READ feed
       Must see Alice's post before Bob's  (causal ordering)
       May not see either yet              (eventual delivery)

Database Selection Framework#

"Which database should I use?" is the single most common question in system design interviews. Here is a framework for reasoning about it instead of memorizing a chart.

Start With Your Access Patterns#

The right database depends on how you will read and write data, not on what kind of data you have. Ask these questions:

What are your read patterns? Point lookups by key? Range queries? Full-text search? Graph traversals? Aggregations?
What are your write patterns? Single-row inserts? Bulk loads? Append-only? Update-heavy?
What are your consistency requirements? Transactions? Multi-row atomicity? Eventual consistency is fine?
What is your scale? Millions of rows or billions? Megabytes per second or gigabytes?

Relational Databases (PostgreSQL, MySQL)#

sql

-- Relational excels at this: complex queries with joins
SELECT u.name, COUNT(o.id) as order_count, SUM(o.total) as lifetime_value
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at > NOW() - INTERVAL '30 days'
GROUP BY u.id
HAVING COUNT(o.id) > 5
ORDER BY lifetime_value DESC
LIMIT 100;

Document Stores (MongoDB, DynamoDB)#

Avoid when: You need complex joins, multi-document transactions are central to your business logic, or you have highly relational data.

json

// Document stores excel at this: nested, variable-schema data
{
  "_id": "product_12345",
  "name": "Wireless Headphones",
  "category": "electronics",
  "attributes": {
    "battery_life": "40 hours",
    "noise_cancellation": true,
    "driver_size": "40mm",
    "bluetooth_version": "5.3"
  },
  "reviews": [
    { "user": "alice", "rating": 5, "text": "Great sound" },
    { "user": "bob", "rating": 4, "text": "Comfortable fit" }
  ],
  "pricing": {
    "base": 199.99,
    "sale": 159.99,
    "currency": "USD"
  }
}

Wide-Column Stores (Cassandra, ScyllaDB, HBase)#

Avoid when: You need ad-hoc queries, your access patterns are unpredictable, or you need strong consistency across partitions.

Graph Databases (Neo4j, Amazon Neptune)#

Choose when: Relationships ARE the data. Social networks, recommendation engines, fraud detection, knowledge graphs. Any problem where you need to traverse relationships of variable depth.

Avoid when: Your relationships are simple (one or two levels of joins in SQL). Graph databases add operational complexity; do not use them just because your data "has relationships."

Time-Series Databases (InfluxDB, TimescaleDB, ClickHouse)#

Choose when: Your data is append-only with a timestamp, queries are time-range based, and you need efficient aggregation over time windows. Monitoring, IoT, financial market data.

Avoid when: Your data is not fundamentally time-ordered, or you need complex relational queries alongside time-series data (consider TimescaleDB, which is PostgreSQL-based).

The Decision Flowchart#

Need ACID transactions + complex queries?
  YES --> PostgreSQL / MySQL
  NO  --> continue

Data is hierarchical with variable schema?
  YES --> MongoDB / DynamoDB
  NO  --> continue

Need massive write throughput + horizontal scaling?
  YES --> Cassandra / ScyllaDB
  NO  --> continue

Traversing deep relationships is the core query?
  YES --> Neo4j / Neptune
  NO  --> continue

Append-only, time-stamped, aggregation queries?
  YES --> InfluxDB / TimescaleDB / ClickHouse
  NO  --> continue

Simple key-value lookups at extreme scale?
  YES --> Redis / DynamoDB / Memcached
  NO  --> Just use PostgreSQL. Seriously.

That last line is not a joke. PostgreSQL handles an enormous range of workloads well. When in doubt, start with PostgreSQL and specialize later when you hit a concrete bottleneck.

Caching Strategies Deep Dive#

Caching is conceptually simple and operationally nightmarish. Here are the four core patterns and when each makes sense.

Cache-Aside (Lazy Loading)#

The application manages the cache explicitly. On read: check cache, if miss then query database and populate cache. On write: update database, then invalidate (or update) cache.

typescript

async function getUser(userId: string): Promise<User> {
  // 1. Check cache
  const cached = await redis.get(`user:${userId}`);
  if (cached) {
    return JSON.parse(cached);
  }
 
  // 2. Cache miss - query database
  const user = await db.query("SELECT * FROM users WHERE id = $1", [userId]);
 
  // 3. Populate cache with TTL
  await redis.setex(`user:${userId}`, 3600, JSON.stringify(user));
 
  return user;
}
 
async function updateUser(userId: string, data: Partial<User>): Promise<void> {
  // 1. Update database (source of truth)
  await db.query("UPDATE users SET ... WHERE id = $1", [userId, ...]);
 
  // 2. Invalidate cache (NOT update - avoids race conditions)
  await redis.del(`user:${userId}`);
}

Pros: Only caches data that is actually requested. Cache failure does not block reads (just slower). Simple to implement.

Cons: Cache miss penalty (cold start). Possible stale data between database write and cache invalidation. N+1 problem if you are not careful.

Write-Through#

Every write goes to both the cache and the database synchronously. Reads always hit the cache.

typescript

async function updateUser(userId: string, data: Partial<User>): Promise<void> {
  // Write to database AND cache atomically (or as close as possible)
  const updatedUser = await db.query(
    "UPDATE users SET ... RETURNING *",
    [userId, ...]
  );
 
  await redis.setex(`user:${userId}`, 3600, JSON.stringify(updatedUser));
}

Pros: Cache is always up to date. Reads are always fast.

Cons: Write latency increases (two writes per operation). Caches data that may never be read. If the cache write fails after the database write succeeds, you have an inconsistency window.

Write-Behind (Write-Back)#

Writes go to the cache immediately, and the cache asynchronously flushes to the database in batches.

Write Request --> Cache (immediate) --> Background flush --> Database
                                        (batched, async)

Pros: Extremely fast writes. Can batch database writes for efficiency. Great for write-heavy workloads.

Read-Through#

The cache itself is responsible for loading data from the database on a miss. The application only talks to the cache, never directly to the database for reads.

Application --> Cache --> HIT? Return data
                      --> MISS? Cache loads from DB, caches, returns

Pros: Clean application code (single data access layer). Cache handles its own population.

Cons: Requires a cache that supports read-through natively (or a wrapper). First request for each key is slow. Less flexibility in how data is loaded.

The Cache Invalidation Nightmare#

Cache invalidation is where every caching strategy eventually breaks down. Here are the problems and partial solutions:

Problem 1: Race conditions during invalidation.

Thread A: UPDATE user SET name='Alice'   (DB)
Thread B: UPDATE user SET name='Bob'     (DB)
Thread A: SET cache user = 'Alice'       (Cache)
Thread B: SET cache user = 'Bob'         (Cache)

Cache has 'Bob', which is correct. But what if the cache writes are reordered?

Thread A: UPDATE user SET name='Alice'   (DB)
Thread B: UPDATE user SET name='Bob'     (DB)
Thread B: SET cache user = 'Bob'         (Cache)
Thread A: SET cache user = 'Alice'       (Cache)  <-- STALE!

Cache now has 'Alice' but DB has 'Bob'.

Solution: Invalidate (delete) instead of updating the cache. Let the next read repopulate it. This eliminates the reordering problem at the cost of one extra cache miss.

Problem 2: Cache stampede.

A popular key expires. 1,000 concurrent requests all see a cache miss simultaneously. All 1,000 hit the database.

typescript

async function getWithStampedeProtection(key: string): Promise<string | null> {
  const cached = await redis.get(key);
  if (cached) return cached;
 
  // Try to acquire a lock
  const lockKey = `lock:${key}`;
  const acquired = await redis.set(lockKey, "1", "EX", 5, "NX");
 
  if (acquired) {
    // We got the lock - fetch from DB and populate cache
    const value = await fetchFromDatabase(key);
    await redis.setex(key, 3600, value);
    await redis.del(lockKey);
    return value;
  } else {
    // Another request is populating the cache - wait and retry
    await sleep(50);
    return getWithStampedeProtection(key);
  }
}

Message Queues and Event-Driven Architecture#

Message queues are the duct tape of distributed systems. They solve problems of coupling, reliability, and scale — and introduce problems of ordering, idempotency, and operational complexity.

When to Use a Queue#

Ask yourself: does this work need to happen synchronously in the request path?

If a user signs up and you need to send a welcome email, update analytics, and provision resources — none of that needs to block the HTTP response. Put it on a queue.

HTTP Request --> API Server --> Response (200 OK, fast)
                    |
                    v
               Message Queue
                    |
          +---------+---------+
          |         |         |
     Send Email  Analytics  Provision

Kafka vs RabbitMQ vs SQS: Real Trade-offs#

These are not interchangeable. They solve fundamentally different problems.

Apache Kafka is a distributed commit log. Messages are persisted, ordered within partitions, and can be replayed. Consumers track their own offset.

Use Kafka when:
- You need event replay (reprocessing historical events)
- You need strict ordering within a partition
- Multiple consumers need the same events (fan-out)
- You are building event sourcing or CQRS
- Throughput matters more than latency (batching is core to Kafka)

Do NOT use Kafka when:
- You have simple task queues (it is massive overkill)
- You need per-message routing/filtering (Kafka's model is partitions)
- You cannot operate a distributed system (Kafka clusters need care)
- Message volume is low (< 1000/sec, use something simpler)

RabbitMQ is a traditional message broker. It supports complex routing (topic exchanges, headers exchanges, fan-out), per-message acknowledgment, and priority queues.

Use RabbitMQ when:
- You need complex routing (message A goes to queue X, message B to queue Y)
- You need per-message acknowledgment and redelivery
- You need priority queues
- Your message patterns are request/reply or work distribution
- You want something battle-tested and relatively simple to operate

Do NOT use RabbitMQ when:
- You need message replay (consumed messages are gone)
- You need extreme throughput (Kafka wins here by an order of magnitude)
- You need multi-datacenter replication (possible but painful)

Amazon SQS is a managed queue service. No cluster to operate, no broker to monitor, auto-scales to any throughput.

Use SQS when:
- You want zero operational overhead
- You are already on AWS
- Your queuing needs are straightforward (FIFO or standard)
- You do not need complex routing or message replay

Do NOT use SQS when:
- You need strict ordering across all messages (FIFO queues have throughput limits)
- You need message replay
- You want to avoid AWS lock-in
- You need sub-second latency (SQS long-polling adds latency)

The Idempotency Imperative#

With any queue, messages can be delivered more than once. Network timeouts, consumer crashes, and broker retries all cause duplicate delivery. Your consumers MUST be idempotent.

typescript

async function processPayment(message: PaymentMessage): Promise<void> {
  const idempotencyKey = message.paymentId;
 
  // Check if we already processed this payment
  const existing = await db.query("SELECT id FROM processed_payments WHERE payment_id = $1", [idempotencyKey]);
 
  if (existing.rows.length > 0) {
    // Already processed - acknowledge and skip
    console.log(`Payment ${idempotencyKey} already processed, skipping`);
    return;
  }
 
  // Process the payment
  await db.query("BEGIN");
  try {
    await db.query("INSERT INTO payments ...", [
      /* payment data */
    ]);
    await db.query("INSERT INTO processed_payments (payment_id) VALUES ($1)", [idempotencyKey]);
    await db.query("COMMIT");
  } catch (error) {
    await db.query("ROLLBACK");
    throw error; // Message will be redelivered
  }
}

Rate Limiting and Backpressure Patterns#

Every system that faces the internet needs rate limiting. Every system that depends on downstream services needs backpressure. They are related but different.

Rate Limiting#

Rate limiting protects your system from being overwhelmed, whether by legitimate traffic spikes, misbehaving clients, or denial-of-service attacks.

Token bucket is the most common algorithm. Each user has a bucket of tokens that refills at a constant rate. Each request consumes a token. When the bucket is empty, requests are rejected.

typescript

async function checkRateLimit(
  userId: string,
  maxTokens: number,
  refillRate: number, // tokens per second
): Promise<boolean> {
  const key = `ratelimit:${userId}`;
  const now = Date.now();
 
  // Lua script for atomic token bucket
  const script = `
    local key = KEYS[1]
    local max_tokens = tonumber(ARGV[1])
    local refill_rate = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])
 
    local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(bucket[1]) or max_tokens
    local last_refill = tonumber(bucket[2]) or now
 
    -- Refill tokens based on elapsed time
    local elapsed = (now - last_refill) / 1000
    tokens = math.min(max_tokens, tokens + (elapsed * refill_rate))
 
    if tokens >= 1 then
      tokens = tokens - 1
      redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
      redis.call('EXPIRE', key, 3600)
      return 1
    else
      redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
      redis.call('EXPIRE', key, 3600)
      return 0
    end
  `;
 
  const result = await redis.eval(script, 1, key, maxTokens, refillRate, now);
  return result === 1;
}

Sliding window is better for APIs where you want to express limits as "100 requests per minute" and have them apply smoothly rather than resetting at fixed intervals.

Backpressure#

Backpressure is the system telling upstream "slow down, I cannot keep up." Without backpressure, queues grow without bound, memory fills up, and eventually something crashes.

Without backpressure:
  Producer (fast) --> Queue (growing) --> Consumer (slow)
                      ^^^^^^^^^^^^^^^^
                      Eventually: OOM, disk full, crash

With backpressure:
  Producer (fast) --> Queue (bounded) --> Consumer (slow)
                      |
                      Queue full? --> Reject / slow producer / drop oldest

The circuit breaker pattern deserves special attention:

typescript

class CircuitBreaker {
  private failures: number = 0;
  private lastFailure: number = 0;
  private state: "closed" | "open" | "half-open" = "closed";
 
  constructor(
    private threshold: number = 5,
    private resetTimeout: number = 30000,
  ) {}
 
  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === "open") {
      if (Date.now() - this.lastFailure > this.resetTimeout) {
        this.state = "half-open";
      } else {
        throw new Error("Circuit breaker is open");
      }
    }
 
    try {
      const result = await fn();
      if (this.state === "half-open") {
        this.state = "closed";
        this.failures = 0;
      }
      return result;
    } catch (error) {
      this.failures++;
      this.lastFailure = Date.now();
      if (this.failures >= this.threshold) {
        this.state = "open";
      }
      throw error;
    }
  }
}

The CAP Theorem Is Oversimplified#

This is technically true but practically misleading for several reasons.

PACELC: A Better Mental Model#

PACELC extends CAP by asking: when there is no Partition, what do you trade off between Latency and Consistency?

IF Partition:
  Choose between Availability and Consistency (same as CAP)
ELSE (normal operation):
  Choose between Latency and Consistency

This is much more useful because you are making the latency/consistency trade-off constantly, not just during rare partition events.

System         | During Partition    | Normal Operation
---------------|--------------------|-----------------
DynamoDB       | AP (available)     | EL (low latency)
Cassandra      | AP (available)     | EL (low latency)
MongoDB        | CP (consistent)    | EC (consistent)
PostgreSQL     | CP (consistent)    | EC (consistent)
Cosmos DB      | Configurable       | Configurable

Real Design Walkthrough: URL Shortener#

Step 1: Clarify Requirements#

Before drawing a single box, I would ask:

What is the expected scale? (100 URLs/day vs 100M URLs/day are completely different designs)
What is the read:write ratio? (URL shorteners are extremely read-heavy, maybe 100:1)
Do short URLs expire? (Affects storage growth and cleanup complexity)
Do we need analytics? (Click tracking, geographic data, referrer information)
Custom short URLs? (vanity URLs like bit.ly/my-brand)
What is the required latency for redirects? (This is the critical path)

Let me assume: 100M new URLs per month, 10B redirects per month, URLs expire after 5 years, basic analytics (click count), no custom URLs for now.

Step 2: Back-of-the-Envelope#

Write QPS: 100M / (30 * 86400) ≈ 40 writes/sec (easy)
Read QPS:  10B / (30 * 86400) ≈ 3,800 reads/sec (moderate)
Peak read: ~10,000 reads/sec (manageable)

Storage:
- 100M * 12 months * 5 years = 6B records
- Each record: short_code (7B) + URL (200B) + metadata (50B) ≈ 257B
- Total: 6B * 257B ≈ 1.5 TB (fits on one machine, but sharding for availability)

Short URL length:
- 7 characters using [a-zA-Z0-9] = 62^7 ≈ 3.5 trillion combinations
- With 6B URLs, collision probability is negligible

Step 3: Key Generation#

This is where most candidates stumble. How do you generate unique, short, URL-safe strings?

Option A: Hash the URL, take first 7 characters.

Problem: collisions. MD5 or SHA256 have uniform distribution, but truncating to 7 chars increases collision probability significantly. You must handle collisions (check database, regenerate).

Option B: Pre-generate keys.

Generate all possible 7-character strings, store them in a database, and pop one off when needed. Eliminates collision handling but requires managing a key pool.

Option C: Counter-based with base62 encoding.

Use a distributed counter (or range-based allocation) and encode the counter value in base62.

typescript

function encodeBase62(num: number): string {
  const chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
  let result = "";
  while (num > 0) {
    result = chars[num % 62] + result;
    num = Math.floor(num / 62);
  }
  return result.padStart(7, "0");
}
 
// Counter 1 -> "0000001"
// Counter 1000000 -> "0004c92"
// Counter 3500000000000 -> "zzzzzzy" (approaching limit)

Step 4: The Architecture#

                              +-------------+
  Client --> CDN/LB -------->| API Servers  |
                              | (stateless) |
                              +------+------+
                                     |
                          +----------+----------+
                          |                     |
                     +----+----+          +-----+-----+
                     |  Redis  |          | PostgreSQL |
                     |  Cache  |          |  (sharded) |
                     +---------+          +-----------+

Read path (redirect):

Client requests GET /abc1234
Check Redis cache for abc1234 -> if HIT, 301 redirect (sub-millisecond)
If MISS, query PostgreSQL, cache the result, 301 redirect
Asynchronously increment click counter (do NOT block the redirect)

Write path (create short URL):

Client sends POST /shorten with the long URL
Server allocates next ID from its pre-fetched range
Encode ID as base62 string
Insert into PostgreSQL
Optionally pre-warm the cache
Return the short URL

Step 5: The Edge Cases Nobody Talks About#

Common Mistakes That Tank System Design Interviews#

After interviewing many candidates, these are the patterns I see most often in failed interviews.

Over-Engineering From the Start#

Not Asking Clarifying Questions#

Jumping straight into the design without scoping is the single biggest red flag. It suggests you build solutions before understanding problems.

Ignoring Operational Concerns#

Your beautiful architecture is worthless if you cannot deploy, monitor, and debug it. Good candidates discuss:

Monitoring: What metrics will you track? How will you know when the system is degrading before users complain?
Deployment: How do you deploy without downtime? What is your rollback strategy?
Failure modes: What happens when the cache goes down? When a database shard is unavailable? When a third-party API is slow?
Data migration: How do you add a new database column without downtime?

Treating Components as Magic Boxes#

Not Considering Cost#

Putting It All Together#

System design is not about memorizing solutions. It is about building a repertoire of patterns and knowing when to apply them. Every design interview is fundamentally the same conversation:

Understand the problem (clarify requirements, establish constraints)
Estimate the scale (back-of-the-envelope math)
Design the high-level architecture (start simple, use the fundamental building blocks)
Deep-dive into critical components (database selection, caching strategy, consistency model)
Address edge cases and failure modes (what happens when things go wrong)
Discuss trade-offs (why this choice and not that one)

That sentence demonstrates more system design skill than any memorized architecture diagram ever could.

Why Most System Design Prep Is Backwards#

The 5 Fundamental Building Blocks#

Load Balancers#

Caches#

Message Queues#

Databases#

CDNs#

Back-of-the-Envelope Math That Actually Matters#

QPS Estimation#

Storage Estimation#

Bandwidth Estimation#

The Numbers You Should Have Memorized#

The Consistency Spectrum#

Strong Consistency#

Eventual Consistency#

Causal Consistency#

Database Selection Framework#

Start With Your Access Patterns#

Relational Databases (PostgreSQL, MySQL)#

Document Stores (MongoDB, DynamoDB)#

Wide-Column Stores (Cassandra, ScyllaDB, HBase)#

Graph Databases (Neo4j, Amazon Neptune)#

Time-Series Databases (InfluxDB, TimescaleDB, ClickHouse)#

The Decision Flowchart#

Caching Strategies Deep Dive#

Cache-Aside (Lazy Loading)#

Write-Through#

Write-Behind (Write-Back)#

Read-Through#

The Cache Invalidation Nightmare#

Message Queues and Event-Driven Architecture#

When to Use a Queue#

Kafka vs RabbitMQ vs SQS: Real Trade-offs#

The Idempotency Imperative#

Rate Limiting and Backpressure Patterns#

Rate Limiting#

Backpressure#

The CAP Theorem Is Oversimplified#

PACELC: A Better Mental Model#

Real Design Walkthrough: URL Shortener#

Step 1: Clarify Requirements#

Step 2: Back-of-the-Envelope#

Step 3: Key Generation#

Step 4: The Architecture#

Step 5: The Edge Cases Nobody Talks About#

Common Mistakes That Tank System Design Interviews#

Over-Engineering From the Start#

Not Asking Clarifying Questions#

Ignoring Operational Concerns#

Treating Components as Magic Boxes#

Not Considering Cost#

Putting It All Together#

مقالات ذات صلة

Database Migrations That Won't Destroy Your Weekend: Zero-Downtime Schema Changes

OAuth 2.1 and OpenID Connect: The Parts That Actually Confuse Everyone

Why Most System Design Prep Is Backwards#

The 5 Fundamental Building Blocks#

Load Balancers#

Caches#

Message Queues#

Databases#

CDNs#

Back-of-the-Envelope Math That Actually Matters#

QPS Estimation#

Storage Estimation#

Bandwidth Estimation#

The Numbers You Should Have Memorized#

The Consistency Spectrum#

Strong Consistency#

Eventual Consistency#

Causal Consistency#

Database Selection Framework#

Start With Your Access Patterns#

Relational Databases (PostgreSQL, MySQL)#

Document Stores (MongoDB, DynamoDB)#

Wide-Column Stores (Cassandra, ScyllaDB, HBase)#

Graph Databases (Neo4j, Amazon Neptune)#

Time-Series Databases (InfluxDB, TimescaleDB, ClickHouse)#

The Decision Flowchart#

Caching Strategies Deep Dive#