Skip to content
21 min read

PostgreSQL Query Optimization: From Slow to Sub-Millisecond

The PostgreSQL optimization techniques I use on production databases. EXPLAIN ANALYZE, index strategies, query rewrites, and the exact changes that cut query times from seconds to microseconds.

Share:X / TwitterLinkedIn

Last month I got a Slack message at 2 AM: "The dashboard is timing out." I SSH'd into the production box, opened pg_stat_activity, and found a single query scanning 14 million rows to return 12 results. The fix was a partial index that took 30 seconds to create. The query went from 4.2 seconds to 0.3 milliseconds.

That's the thing about PostgreSQL performance. The problems are almost never exotic. They're missed indexes, bad statistics, queries that made sense when the table had 10,000 rows but now it has 10 million. The fixes are usually simple — once you know where to look.

This post is everything I've learned about PostgreSQL query optimization from running production databases. No theory without practice. Every technique here has saved me real time on real systems.

The Debugging Mindset: Don't Guess, Measure#

The single biggest mistake I see developers make with slow queries is guessing. "Maybe we need an index on that column." "Maybe the JOIN is slow." "Maybe we should add more RAM."

Stop guessing. PostgreSQL has one of the best query analyzers of any database. Use it.

EXPLAIN — The Blueprint#

Plain EXPLAIN shows you what PostgreSQL plans to do, without actually running the query:

sql
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
                                    QUERY PLAN
---------------------------------------------------------------------------
 Index Scan using idx_orders_customer_id on orders  (cost=0.43..8.45 rows=1 width=128)
   Index Cond: (customer_id = 42)

This tells you the plan — an index scan — and the estimated cost. But it's a prediction. PostgreSQL might be completely wrong about the row count. I've seen estimates of "1 row" when the actual result was 50,000 rows. That kind of misestimate cascades into terrible plan choices.

Use plain EXPLAIN when you want a quick look at the plan without actually running the query. This matters when you're analyzing a DELETE that would modify data, or a query that takes 30 minutes to run.

EXPLAIN ANALYZE — The Truth#

sql
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
                                    QUERY PLAN
---------------------------------------------------------------------------
 Index Scan using idx_orders_customer_id on orders  (cost=0.43..8.45 rows=1 width=128)
                                                    (actual time=0.021..0.023 rows=1 loops=1)
   Index Cond: (customer_id = 42)
 Planning Time: 0.085 ms
 Execution Time: 0.041 ms

Now you see the actual execution time and the actual row count. This is where the real debugging happens. Compare rows=1 (estimated) with rows=1 (actual) — they match, so the planner made a good decision. When they don't match, that's your first clue.

Warning: EXPLAIN ANALYZE actually runs the query. If you're analyzing an UPDATE or DELETE, wrap it in a transaction:

sql
BEGIN;
EXPLAIN ANALYZE DELETE FROM sessions WHERE expires_at < now();
ROLLBACK;

EXPLAIN (ANALYZE, BUFFERS) — The Full Picture#

This is what I actually use 90% of the time:

sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > '2026-01-01'
  AND o.status = 'completed';
                                    QUERY PLAN
---------------------------------------------------------------------------
 Hash Join  (cost=425.00..1893.25 rows=2150 width=48)
            (actual time=2.341..15.892 rows=2347 loops=1)
   Hash Cond: (o.customer_id = c.id)
   Buffers: shared hit=312 read=45
   ->  Bitmap Heap Scan on orders o  (cost=12.50..1450.75 rows=2150 width=32)
                                     (actual time=0.512..12.105 rows=2347 loops=1)
         Recheck Cond: (created_at > '2026-01-01'::date)
         Filter: (status = 'completed')
         Rows Removed by Filter: 893
         Heap Blocks: exact=189
         Buffers: shared hit=195 read=45
         ->  Bitmap Index Scan on idx_orders_created_at  (cost=0.00..11.96 rows=3043 width=0)
                                                         (actual time=0.287..0.287 rows=3240 loops=1)
               Index Cond: (created_at > '2026-01-01'::date)
               Buffers: shared hit=12
   ->  Hash  (cost=287.50..287.50 rows=10000 width=24)
             (actual time=1.753..1.753 rows=10000 loops=1)
         Buffers: shared hit=117
         ->  Seq Scan on customers c  (cost=0.00..287.50 rows=10000 width=24)
                                      (actual time=0.008..0.892 rows=10000 loops=1)
               Buffers: shared hit=117
 Planning Time: 0.215 ms
 Execution Time: 16.147 ms

The BUFFERS output is critical. shared hit=312 means 312 pages came from the buffer cache (RAM). read=45 means 45 pages had to be read from disk. If you see a lot of read relative to hit, your shared_buffers might be too small, or the table is too large to stay cached.

I also use FORMAT JSON when I need to paste the plan into explain.dalibo.com for visualization. The tree view makes complex plans much easier to read:

sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT ...;

EXPLAIN Options I Actually Use#

Here's the full diagnostic query I run first for any slow query investigation:

sql
EXPLAIN (ANALYZE, BUFFERS, TIMING, VERBOSE)
SELECT ...;
  • TIMING — per-node timing (on by default with ANALYZE, but I'm explicit)
  • VERBOSE — shows output column lists and schema-qualified table names

Reading Query Plans Like a Detective#

Every query plan is a tree. PostgreSQL reads it bottom-up: the deepest indented nodes execute first, feeding results upward. Understanding the scan types is the foundation of everything else.

Seq Scan — The Full Table Scan#

Seq Scan on orders  (cost=0.00..25000.00 rows=1000000 width=128)

Sequential scan reads every single row in the table. For a million-row table, that's every page on disk.

But here's the nuance: Seq Scan is not always bad. If you're selecting 30% or more of the table, a sequential scan is actually faster than an index scan because sequential I/O is much faster than random I/O. PostgreSQL knows this. If it's choosing a Seq Scan when you have an index, check how selective your WHERE clause actually is.

When it's a problem: when you're selecting a tiny fraction of rows from a large table and there's no index.

Index Scan — The Targeted Lookup#

Index Scan using idx_orders_customer_id on orders  (cost=0.43..8.45 rows=5 width=128)
  Index Cond: (customer_id = 42)

Goes to the B-tree index, finds the matching entries, then fetches the actual rows from the table heap. Each row fetch is a random I/O operation to the heap. This is great for highly selective queries but each heap fetch has a cost.

Index Only Scan — The Holy Grail#

Index Only Scan using idx_orders_customer_total on orders  (cost=0.43..4.15 rows=5 width=12)
  Index Cond: (customer_id = 42)
  Heap Fetches: 0

This is the fastest scan type. PostgreSQL gets everything it needs from the index alone, never touching the table heap. You see this when all the columns you SELECT and WHERE on are in the index.

The catch: Heap Fetches: 0 means the visibility map is up to date. If your table has a lot of dead tuples (not vacuumed recently), PostgreSQL still has to check the heap to verify row visibility. This is one reason autovacuum matters for performance, not just disk space.

Bitmap Scan — The Middle Ground#

Bitmap Heap Scan on orders  (cost=45.00..1250.00 rows=5000 width=128)
  Recheck Cond: (created_at > '2026-01-01')
  ->  Bitmap Index Scan on idx_orders_created_at  (cost=0.00..43.75 rows=5000 width=0)
        Index Cond: (created_at > '2026-01-01')

Bitmap scans are PostgreSQL's answer to the "too many rows for an index scan, too few for a seq scan" problem. It builds a bitmap of which pages contain matching rows, sorts them by physical location, then fetches them in order. This converts random I/O into sequential I/O.

You'll often see bitmap scans when two or more indexes are combined:

Bitmap Heap Scan on orders
  Recheck Cond: ((status = 'completed') AND (created_at > '2026-01-01'))
  ->  BitmapAnd
        ->  Bitmap Index Scan on idx_orders_status
              Index Cond: (status = 'completed')
        ->  Bitmap Index Scan on idx_orders_created_at
              Index Cond: (created_at > '2026-01-01')

This BitmapAnd combines two separate indexes. It's PostgreSQL saying "I don't have a composite index, but I can combine these two single-column indexes." It works, but a proper composite index would be faster.

Spotting Bad Estimates#

The number one thing I look for in a query plan is mismatched estimates:

Nested Loop  (cost=0.43..50.12 rows=1 width=128)
             (actual time=0.025..4521.337 rows=48753 loops=1)

The planner estimated 1 row. The actual result was 48,753 rows. That's a five-order-of-magnitude misestimate. The planner chose a Nested Loop because it thought it was joining against 1 row. With 48,753 rows, a Hash Join would have been orders of magnitude faster.

Common causes of bad estimates:

  • Stale statistics: Run ANALYZE on the table
  • Correlated columns: The planner assumes column values are independent. If status = 'shipped' and created_at > '2026-01-01' are correlated (most recent orders are shipped), the planner underestimates the combined selectivity
  • Custom functions in WHERE: The planner uses a default selectivity estimate (usually 0.5% for equality, 33% for range) when it can't analyze a function
  • Parameterized queries with generic plans: After 5 executions, PostgreSQL may switch to a generic plan that doesn't consider the actual parameter value

When you see bad estimates, the fix is usually one of: run ANALYZE, create extended statistics, rewrite the query, or use a CTE as an optimization fence.

sql
-- Create extended statistics for correlated columns
CREATE STATISTICS orders_status_date (dependencies)
ON status, created_at FROM orders;
ANALYZE orders;

Index Strategy: The Right Index for the Right Job#

Indexes aren't free. Each index slows down writes, consumes disk space, and needs to be maintained. The goal isn't "index everything" — it's "index exactly what you need."

B-tree — The Default Workhorse#

B-tree is the default and handles the vast majority of cases. It supports equality and range operators (=, <, >, <=, >=, BETWEEN, IN, IS NULL).

sql
CREATE INDEX idx_orders_created_at ON orders (created_at);

B-tree index column order matters enormously for composite indexes. The leftmost column is the primary sort, then the next, and so on. An index on (a, b, c) can efficiently answer:

  • WHERE a = 1
  • WHERE a = 1 AND b = 2
  • WHERE a = 1 AND b = 2 AND c = 3
  • WHERE a = 1 AND b > 5
  • WHERE a = 1 ORDER BY b

It cannot efficiently answer:

  • WHERE b = 2 (skips the first column)
  • WHERE c = 3 (skips the first two columns)
  • WHERE a = 1 AND c = 3 (gap in the middle — a condition used, c condition requires filter)

Think of it like a phone book sorted by last name, then first name. You can quickly find all "Smith" entries, or "Smith, John" specifically. But you can't quickly find all "John" entries without scanning the entire book.

The rule: put equality conditions first, then range conditions, then sort columns.

sql
-- Query: WHERE status = 'active' AND created_at > '2026-01-01' ORDER BY total DESC
 
-- Good: equality first, then range, then sort
CREATE INDEX idx_orders_optimized ON orders (status, created_at, total);
 
-- Bad: range before equality
CREATE INDEX idx_orders_bad ON orders (created_at, status, total);

Covering Indexes with INCLUDE#

PostgreSQL 11+ lets you add non-key columns to an index with INCLUDE. These columns are stored in the leaf pages but aren't part of the B-tree structure. They enable index-only scans without bloating the tree:

sql
-- We frequently run:
-- SELECT id, total, status FROM orders WHERE customer_id = 42;
 
CREATE INDEX idx_orders_customer_covering
ON orders (customer_id) INCLUDE (total, status);

Now PostgreSQL can satisfy the entire query from the index alone. Without INCLUDE, it would need to fetch total and status from the heap. The difference on a cold cache can be dramatic — I've seen queries go from 50ms to 0.2ms just by adding INCLUDE columns.

Partial Indexes — Index Only What Matters#

This is my single favorite PostgreSQL feature and the one most developers don't know about.

sql
-- Only 2% of orders are 'pending', but we query them constantly
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending';

This index is tiny compared to a full index on created_at. It only contains rows where status = 'pending'. Queries that match the WHERE clause use this small, fast index. Queries that don't match ignore it.

Real example from production: I had a sessions table with 50 million rows. Only about 200,000 were active (not expired). A full index on user_id was 1.2 GB. A partial index WHERE expires_at > now() was 8 MB. The query went from 12ms to 0.1ms because the entire index fit in cache.

sql
-- Another common pattern: soft deletes
CREATE INDEX idx_users_active_email ON users (email)
WHERE deleted_at IS NULL;
 
-- Unique constraint only on active records
CREATE UNIQUE INDEX idx_users_unique_email ON users (email)
WHERE deleted_at IS NULL;

That last one is incredibly useful. It lets you have a unique email constraint on active users while allowing the same email to appear in deleted records.

GIN Indexes — Full-Text Search and JSONB#

GIN (Generalized Inverted Index) is the answer when you need to search inside values — arrays, JSONB documents, or full-text search vectors.

sql
-- JSONB containment queries
CREATE INDEX idx_products_metadata ON products USING GIN (metadata);
 
-- Now this is fast:
SELECT * FROM products WHERE metadata @> '{"color": "red"}';
 
-- Full-text search
CREATE INDEX idx_articles_search ON articles USING GIN (to_tsvector('english', title || ' ' || body));

GIN indexes are large and slow to build, but blazing fast to query. They're worth it for read-heavy workloads with complex containment queries.

For JSONB, there's also jsonb_path_ops which is smaller and faster for @> containment queries, but doesn't support existence operators (?, ?|, ?&):

sql
CREATE INDEX idx_products_metadata_pathops
ON products USING GIN (metadata jsonb_path_ops);

GiST Indexes — Geometric and Range Types#

GiST (Generalized Search Tree) handles overlapping data types: geometric shapes, ranges, full-text search (alternative to GIN, smaller but slower).

sql
-- IP range lookups
CREATE INDEX idx_ip_ranges ON ip_blocks USING GiST (ip_range);
 
SELECT * FROM ip_blocks WHERE ip_range @> '192.168.1.50'::inet;
 
-- Temporal ranges (scheduling conflicts)
CREATE INDEX idx_bookings_period ON bookings USING GiST (
  tstzrange(start_time, end_time)
);
 
-- Find overlapping bookings:
SELECT * FROM bookings
WHERE tstzrange(start_time, end_time) && tstzrange('2026-03-01 10:00', '2026-03-01 12:00');

Hash Indexes — Niche but Useful#

Hash indexes are only useful for equality comparisons. Since PostgreSQL 10 they're WAL-logged and crash-safe. They're smaller than B-tree for wide columns and slightly faster for pure equality:

sql
-- Good for equality-only lookups on large text columns
CREATE INDEX idx_sessions_token ON sessions USING HASH (session_token);
 
-- This is fast:
SELECT * FROM sessions WHERE session_token = 'abc123def456...';
 
-- This CANNOT use the hash index:
SELECT * FROM sessions WHERE session_token LIKE 'abc%';

In practice, I rarely use hash indexes. B-tree handles equality just fine, and the inability to do range queries or ordering makes hash indexes too inflexible for most real use cases.

When Indexes Hurt#

Every index has a cost:

  • Write amplification: Each INSERT updates every index on the table. A table with 8 indexes means 8 additional writes per INSERT
  • HOT updates blocked: Heap-Only Tuple (HOT) updates are a major optimization where PostgreSQL can update a row without updating indexes, but only if no indexed column changed. More indexes = more chances of blocking HOT updates
  • Vacuum overhead: More indexes means vacuum takes longer
  • Planner overhead: More indexes means the planner has more options to evaluate

I regularly audit indexes on production tables:

sql
-- Find unused indexes
SELECT
  schemaname, tablename, indexname,
  idx_scan,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelid NOT IN (
    SELECT conindid FROM pg_constraint
    WHERE contype IN ('p', 'u')  -- exclude primary key and unique constraints
  )
ORDER BY pg_relation_size(indexrelid) DESC;

I've found multi-gigabyte indexes that were created during a one-time migration and never used again. Dropping them sped up writes noticeably.

sql
-- Find duplicate indexes (same columns, same order)
SELECT
  a.indexrelid::regclass AS index1,
  b.indexrelid::regclass AS index2,
  a.indrelid::regclass AS table_name
FROM pg_index a
JOIN pg_index b ON a.indrelid = b.indrelid
  AND a.indexrelid < b.indexrelid
  AND a.indkey = b.indkey;

The N+1 Query Problem#

Every ORM developer hits this eventually. It's the most common performance problem I debug.

What N+1 Looks Like#

python
# Python / SQLAlchemy — the classic trap
orders = session.query(Order).filter_by(status='pending').all()
 
for order in orders:
    # This triggers a separate query for EACH order
    print(f"Customer: {order.customer.name}")

This generates:

sql
-- Query 1: Get all pending orders
SELECT * FROM orders WHERE status = 'pending';
 
-- Query 2: Get customer for order 1
SELECT * FROM customers WHERE id = 101;
 
-- Query 3: Get customer for order 2
SELECT * FROM customers WHERE id = 102;
 
-- ...Query 502: Get customer for order 500
SELECT * FROM customers WHERE id = 600;

501 queries instead of 1 or 2. Each query is fast individually, maybe 0.5ms. But 501 of them adds up to 250ms of just database time, plus network round-trip latency for each one.

Detecting N+1 in Logs#

The fastest way to catch N+1 queries is to enable statement logging temporarily:

sql
-- Log all queries that take more than 0ms (i.e., all queries)
SET log_min_duration_statement = 0;
 
-- Better for production: log only slow queries
ALTER SYSTEM SET log_min_duration_statement = 50;  -- 50ms threshold
SELECT pg_reload_conf();

Then look at the logs. N+1 is unmistakable — you'll see hundreds of identical queries with different parameter values in rapid succession.

A more targeted approach for development:

sql
-- Enable auto_explain for slow queries
LOAD 'auto_explain';
SET auto_explain.log_min_duration = 100;  -- 100ms
SET auto_explain.log_nested_statements = ON;
SET auto_explain.log_analyze = ON;

This logs the full EXPLAIN ANALYZE output for any query over 100ms, including queries inside functions.

The Fix: Eager Loading or JOINs#

ORM approach — tell the ORM to load related data upfront:

python
# SQLAlchemy — joinedload fetches customers in the same query
from sqlalchemy.orm import joinedload
 
orders = (
    session.query(Order)
    .options(joinedload(Order.customer))
    .filter_by(status='pending')
    .all()
)

Raw SQL approach — just use a JOIN:

sql
SELECT o.*, c.name AS customer_name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'pending';

One query. Done.

JOINs vs Multiple Queries#

There's a valid debate here. Sometimes two queries are better than a JOIN:

sql
-- Approach 1: Single JOIN (might produce duplicate data if one-to-many)
SELECT o.*, i.product_name, i.quantity
FROM orders o
JOIN order_items i ON i.order_id = o.id
WHERE o.status = 'pending';
 
-- Approach 2: Two queries (less data transfer if orders have many columns)
SELECT * FROM orders WHERE status = 'pending';
-- application collects order_ids = [1, 2, 3, ...]
SELECT * FROM order_items WHERE order_id = ANY($1);

The JOIN approach duplicates order data for every item. If each order has 20 items and the order row is wide, that's 20x data transfer. The two-query approach sends each order exactly once.

My rule: use JOINs for one-to-one relationships, consider separate queries for one-to-many when the "one" side is wide. But always benchmark — the network round-trip of a second query often costs more than the duplicate data.

Common Query Rewrites#

Some queries are slow not because of missing indexes but because of how they're written. PostgreSQL's planner is good, but it's not magic.

Subquery vs JOIN vs CTE#

These three approaches can produce very different plans:

sql
-- Subquery in WHERE — often fine, sometimes terrible
SELECT * FROM orders
WHERE customer_id IN (
  SELECT id FROM customers WHERE country = 'US'
);
 
-- JOIN — usually the best choice
SELECT o.* FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE c.country = 'US';
 
-- CTE — materialized in PG 11 and below, optimized in PG 12+
WITH us_customers AS (
  SELECT id FROM customers WHERE country = 'US'
)
SELECT o.* FROM orders o
JOIN us_customers uc ON uc.id = o.customer_id;

In PostgreSQL 12+, CTEs are usually inlined (the planner treats them like subqueries), so performance is identical. But in PostgreSQL 11 and below, CTEs are optimization fences — the planner materializes them and can't push predicates through them. If you're still on PG 11, avoid CTEs for performance-critical queries.

EXISTS vs IN vs JOIN#

This comes up constantly:

sql
-- EXISTS — stops at first match, great for checking existence
SELECT * FROM orders o
WHERE EXISTS (
  SELECT 1 FROM returns r WHERE r.order_id = o.id
);
 
-- IN with subquery — builds full result set first
SELECT * FROM orders
WHERE id IN (SELECT order_id FROM returns);
 
-- JOIN — can produce duplicates if returns has multiple rows per order
SELECT DISTINCT o.* FROM orders o
JOIN returns r ON r.order_id = o.id;

For large tables, EXISTS often wins because it short-circuits. The IN version has to build the entire list of return order_ids before filtering. The JOIN version can produce duplicates, requiring DISTINCT which adds a sort or hash step.

My default: use EXISTS when checking for the existence of related rows. It's the most semantically clear and usually the fastest.

But there's a counterexample. If the subquery result set is small and you need it for multiple conditions:

sql
-- If the subquery returns few rows, IN is perfectly fine
SELECT * FROM orders
WHERE customer_id IN (1, 2, 3, 4, 5);

Avoiding SELECT *#

This isn't just a code style preference. It has real performance implications:

sql
-- Bad: fetches all 25 columns including a TEXT column with kilobytes of data
SELECT * FROM products WHERE category = 'electronics';
 
-- Good: fetches only what you need
SELECT id, name, price FROM products WHERE category = 'electronics';

With SELECT *:

  • Can't use index-only scans (all columns would need to be in the index)
  • Transfers more data over the network
  • Uses more memory for sorting and hashing
  • If someone adds a 10MB BYTEA column later, your existing queries silently get slower

Window Functions vs Subqueries#

Window functions are one of PostgreSQL's most powerful features and almost always outperform correlated subqueries:

sql
-- Slow: correlated subquery runs once per row
SELECT
  o.*,
  (SELECT COUNT(*) FROM order_items i WHERE i.order_id = o.id) AS item_count,
  (SELECT SUM(amount) FROM payments p WHERE p.order_id = o.id) AS total_paid
FROM orders o
WHERE o.status = 'completed';
 
-- Fast: window functions compute in a single pass
SELECT
  o.*,
  COUNT(i.id) OVER (PARTITION BY o.id) AS item_count,
  SUM(p.amount) OVER (PARTITION BY o.id) AS total_paid
FROM orders o
LEFT JOIN order_items i ON i.order_id = o.id
LEFT JOIN payments p ON p.order_id = o.id
WHERE o.status = 'completed';

Another common pattern — getting the latest row per group:

sql
-- Slow: correlated subquery
SELECT * FROM notifications n
WHERE n.created_at = (
  SELECT MAX(created_at) FROM notifications
  WHERE user_id = n.user_id
);
 
-- Fast: DISTINCT ON (PostgreSQL-specific, extremely useful)
SELECT DISTINCT ON (user_id) *
FROM notifications
ORDER BY user_id, created_at DESC;
 
-- Also fast: ROW_NUMBER window function
SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn
  FROM notifications
) sub
WHERE rn = 1;

DISTINCT ON is my go-to for this pattern. It's concise, readable, and PostgreSQL optimizes it well with the right index:

sql
CREATE INDEX idx_notifications_user_latest
ON notifications (user_id, created_at DESC);

Pagination Done Right#

OFFSET is a trap for large datasets:

sql
-- Page 1: fast
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 0;
 
-- Page 500: slow — PostgreSQL must scan and discard 9,980 rows
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 9980;
 
-- Page 5000: very slow — scans 99,980 rows to return 20
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 99980;

The fix is keyset pagination (also called cursor-based pagination):

sql
-- First page
SELECT * FROM products ORDER BY id LIMIT 20;
 
-- Next page: use the last id from the previous page
SELECT * FROM products WHERE id > 12345 ORDER BY id LIMIT 20;

This is always fast because it uses the index to jump directly to the right position, regardless of which "page" you're on. The trade-off is you can't jump to an arbitrary page number, but for infinite scroll or "next page" UIs, keyset pagination is strictly superior.

For complex sort orders:

sql
-- Keyset pagination with multiple sort columns
SELECT * FROM products
WHERE (price, id) > (29.99, 12345)
ORDER BY price, id
LIMIT 20;

This uses row-value comparison, which PostgreSQL handles efficiently with a composite index on (price, id).

Table Statistics and Vacuuming#

PostgreSQL's query planner makes decisions based on statistics about your data. Bad statistics lead to bad plans. It's that simple.

ANALYZE: Update the Statistics#

sql
-- Analyze a single table
ANALYZE orders;
 
-- Analyze the entire database
ANALYZE;
 
-- See current statistics for a column
SELECT
  tablename, attname, n_distinct, most_common_vals, correlation
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'status';

The correlation value is particularly interesting. It ranges from -1 to 1 and measures how well the physical order of rows matches the logical order of the column. A correlation close to 1 or -1 means the data is physically sorted on that column, making range scans very efficient (sequential I/O). A correlation near 0 means random I/O for range queries.

sql
-- Increase statistics target for columns with skewed distributions
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 1000;
ANALYZE orders;

The default statistics target is 100, meaning PostgreSQL samples 300 * 100 = 30,000 rows. For columns with many distinct values or skewed distributions, increasing this to 500 or 1000 gives the planner better data at the cost of slightly longer ANALYZE times.

Autovacuum Tuning#

Autovacuum does two things: reclaims space from dead tuples (deleted or updated rows) and updates statistics. On busy tables, the default autovacuum settings are often too conservative.

sql
-- Check autovacuum status
SELECT
  relname,
  n_live_tup,
  n_dead_tup,
  last_vacuum,
  last_autovacuum,
  last_analyze,
  last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

If you see tables with millions of dead tuples and the last vacuum was hours ago, your autovacuum is falling behind.

For high-churn tables (like sessions, job queues, or metrics), I set per-table autovacuum settings:

sql
-- More aggressive autovacuum for high-write tables
ALTER TABLE sessions SET (
  autovacuum_vacuum_scale_factor = 0.01,     -- vacuum at 1% dead tuples (default 20%)
  autovacuum_analyze_scale_factor = 0.005,   -- analyze at 0.5% changes (default 10%)
  autovacuum_vacuum_cost_delay = 2           -- faster vacuum (default 2ms in newer PG)
);

Table Bloat#

When PostgreSQL updates a row, it doesn't modify the row in place — it creates a new version and marks the old one as dead. Vacuum reclaims the dead rows, but the space is only reused by that table. The table file on disk doesn't shrink.

Over time, a heavily updated table can have significant bloat — the table is much larger on disk than the live data requires. This means more pages to scan, more I/O, more cache pressure.

sql
-- Estimate table bloat (simplified version)
SELECT
  current_database(),
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
  pg_size_pretty(
    pg_total_relation_size(schemaname || '.' || tablename) -
    pg_relation_size(schemaname || '.' || tablename)
  ) AS index_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 20;

For severely bloated tables, the nuclear option is VACUUM FULL — it rewrites the entire table. But it takes an exclusive lock, so you can't do it on a live production table without downtime. The better approach is pg_repack, which does the same thing without locking:

bash
pg_repack --table orders --no-kill-backend -d mydb

Connection Pooling#

Here's something that surprises a lot of developers: PostgreSQL connections are expensive. Each connection spawns a new process (not a thread), consumes about 5-10 MB of memory, and has a nontrivial fork overhead.

The default max_connections is 100. If you have an application server with 20 workers, each opening 5 connections, you're already at your limit. Add a background job processor, a monitoring tool, and a migration running somewhere, and you're in trouble.

Why You Need a Connection Pooler#

Without a pooler, if your application needs to handle 500 concurrent requests, you need 500 PostgreSQL connections. That's 5 GB of memory just for connection overhead, and PostgreSQL's performance degrades significantly beyond a few hundred connections due to process management overhead.

With PgBouncer in front of PostgreSQL, those 500 application connections map to maybe 20 actual PostgreSQL connections. The pooler queues requests when all database connections are busy.

PgBouncer Configuration#

ini
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb
 
[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
 
; Pool mode
pool_mode = transaction
 
; Pool sizing
default_pool_size = 20
max_client_conn = 500
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3
 
; Timeouts
server_idle_timeout = 300
client_idle_timeout = 0
query_timeout = 30

Transaction vs Session Mode#

  • Transaction mode (pool_mode = transaction): Connections are returned to the pool after each transaction. This is what you want 95% of the time. Maximum connection reuse.
  • Session mode (pool_mode = session): Connections are held for the entire client session. Use this if you need prepared statements, SET commands, LISTEN/NOTIFY, or other session-level features.
  • Statement mode (pool_mode = statement): Connections are returned after each statement. Too restrictive for most applications — you can't even use explicit transactions.

The gotcha with transaction mode: you can't use prepared statements (they're session-level state), you can't use SET for session variables, and LISTEN/NOTIFY won't work as expected. Most ORMs have a way to disable prepared statements.

For Node.js with pg driver:

javascript
const pool = new Pool({
  host: '127.0.0.1',
  port: 6432,  // PgBouncer port, not PostgreSQL port
  database: 'mydb',
  max: 20,     // matches PgBouncer default_pool_size
  statement_timeout: 30000,
  // Disable prepared statements for PgBouncer transaction mode
  prepare: false,
});

Pool Sizing Formula#

There's a common formula for optimal PostgreSQL connection pool size:

optimal_connections = (core_count * 2) + effective_spindle_count

For a modern server with 4 cores and an SSD (1 effective spindle):

optimal = (4 * 2) + 1 = 9

This seems counterintuitively low. But PostgreSQL is CPU-bound on most modern hardware (SSDs are fast enough that I/O rarely bottlenecks). More connections than this leads to context switching overhead that actually slows things down.

In practice, I usually set default_pool_size to 2-3x this number to handle burst traffic, with the understanding that at peak load, some queries will wait in the PgBouncer queue rather than all hitting PostgreSQL simultaneously.

Practical Checklist: The Exact Steps for Every Slow Query#

Here's my actual process when I get a "this query is slow" report. I follow these steps in this order, every time.

Step 1: Get the Actual Query#

Not "the endpoint is slow" — the actual SQL. If you're using an ORM, enable query logging:

sql
-- Temporarily log all queries over 100ms
ALTER SYSTEM SET log_min_duration_statement = 100;
SELECT pg_reload_conf();

Or check pg_stat_statements for the top queries by total time:

sql
-- Top 10 queries by total execution time
SELECT
  queryid,
  calls,
  round(total_exec_time::numeric, 2) AS total_time_ms,
  round(mean_exec_time::numeric, 2) AS avg_time_ms,
  round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS pct,
  query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

pg_stat_statements is the single most valuable extension for PostgreSQL performance. If you're not running it, install it now:

sql
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Step 2: Run EXPLAIN (ANALYZE, BUFFERS)#

sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) <the slow query>;

Look for:

  1. Bad row estimates — actual rows very different from estimated rows
  2. Seq Scans on large tables — potential missing index
  3. Nested Loops with many rows — should be a Hash Join or Merge Join
  4. High buffer reads — cold cache or table too large
  5. Sort operations spilling to disk — increase work_mem or add index for sort

Step 3: Check Table Statistics#

sql
-- When were stats last updated?
SELECT
  relname,
  last_analyze,
  last_autoanalyze,
  n_live_tup,
  n_dead_tup
FROM pg_stat_user_tables
WHERE relname = 'orders';

If last_analyze is old or n_dead_tup is high relative to n_live_tup, run:

sql
ANALYZE orders;

Then re-run EXPLAIN ANALYZE. If the plan changes, stale statistics were the problem.

Step 4: Check Existing Indexes#

sql
-- What indexes exist on this table?
SELECT
  indexname,
  indexdef,
  idx_scan,
  idx_tup_read,
  pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
JOIN pg_indexes ON pg_stat_user_indexes.indexrelname = pg_indexes.indexname
WHERE tablename = 'orders'
ORDER BY idx_scan DESC;

Maybe the index exists but isn't being used. Maybe the column order is wrong for your query.

Step 5: Create or Modify Indexes#

Based on the query plan, create the appropriate index. Test with EXPLAIN ANALYZE before and after.

sql
-- Create the index concurrently (doesn't lock the table)
CREATE INDEX CONCURRENTLY idx_orders_customer_status
ON orders (customer_id, status)
WHERE status IN ('pending', 'processing');
 
-- Verify the index is used
EXPLAIN (ANALYZE, BUFFERS) <the slow query>;

Always use CONCURRENTLY on production tables. A regular CREATE INDEX takes a full table lock that blocks all writes.

Step 6: Consider Query Rewrites#

If the index exists and statistics are fresh but the query is still slow, look at the query itself:

  • Can a subquery be rewritten as a JOIN?
  • Is OFFSET causing problems? Switch to keyset pagination
  • Are you selecting more columns than needed?
  • Can a correlated subquery become a window function?
  • Is a CTE preventing the planner from optimizing?

Step 7: Check Server Settings#

For specific query patterns, server settings matter:

sql
-- work_mem: memory for sorts and hash joins (per operation, not per connection)
-- Default is 4MB, which is too low for complex queries
SET work_mem = '64MB';  -- try this and re-run EXPLAIN ANALYZE
 
-- effective_cache_size: tells planner how much disk cache to expect
-- Set to ~75% of total RAM
SHOW effective_cache_size;
 
-- random_page_cost: ratio of random I/O to sequential I/O
-- Default is 4.0, set to 1.1-1.5 for SSDs
SHOW random_page_cost;

If changing work_mem fixes a sort-to-disk problem, consider increasing it globally. But be careful — it's per-operation, not per-connection. A complex query with 10 sort operations and work_mem = 256MB could use 2.5 GB for a single query.

Step 8: Monitor After the Fix#

Don't just fix and forget. Verify the fix holds up:

sql
-- Reset pg_stat_statements to see fresh data after your changes
SELECT pg_stat_statements_reset();
 
-- Check back in an hour/day
SELECT
  queryid,
  calls,
  round(mean_exec_time::numeric, 2) AS avg_time_ms,
  round(max_exec_time::numeric, 2) AS max_time_ms,
  query
FROM pg_stat_statements
WHERE query LIKE '%orders%'
ORDER BY mean_exec_time DESC;

Bonus: Quick Wins I Apply to Every New Database#

These are settings and practices I apply to every PostgreSQL database I set up, before any performance issues arise:

sql
-- 1. Install pg_stat_statements
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
 
-- 2. Enable slow query logging (50ms is my threshold)
ALTER SYSTEM SET log_min_duration_statement = 50;
 
-- 3. SSD-appropriate settings
ALTER SYSTEM SET random_page_cost = 1.1;
ALTER SYSTEM SET effective_io_concurrency = 200;
 
-- 4. Increase work_mem from the anemic default
ALTER SYSTEM SET work_mem = '32MB';
 
-- 5. Set effective_cache_size to 75% of RAM
-- For a 16GB server:
ALTER SYSTEM SET effective_cache_size = '12GB';
 
-- 6. Set shared_buffers to 25% of RAM
-- For a 16GB server:
ALTER SYSTEM SET shared_buffers = '4GB';
 
-- 7. Improve autovacuum responsiveness
ALTER SYSTEM SET autovacuum_naptime = '10s';
 
-- Apply changes
SELECT pg_reload_conf();
-- Note: shared_buffers requires a restart

And a monitoring query I run weekly:

sql
-- Tables that need attention
SELECT
  schemaname || '.' || relname AS table,
  pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
  n_live_tup,
  n_dead_tup,
  CASE WHEN n_live_tup > 0
    THEN round(100.0 * n_dead_tup / n_live_tup, 1)
    ELSE 0
  END AS dead_pct,
  last_autovacuum,
  last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;

Final Thoughts#

PostgreSQL query optimization isn't a dark art. It's a systematic process:

  1. Measure — don't guess. EXPLAIN (ANALYZE, BUFFERS) is your best friend.
  2. Understand the plan — learn to read scan types, join types, and row estimates.
  3. Index strategically — the right index on the right columns, in the right order. Partial indexes and covering indexes are underused superpowers.
  4. Write better queries — EXISTS over IN for existence checks, keyset pagination over OFFSET, window functions over correlated subqueries.
  5. Maintain the database — autovacuum tuning, statistics updates, connection pooling.
  6. Monitor continuously — pg_stat_statements tells you where your database spends its time. Check it regularly.

The difference between a query that takes 4 seconds and one that takes 0.3 milliseconds is rarely hardware. It's almost always knowledge — knowing where to look and what to change. And now you know where to look.

Related Posts