PostgreSQL Query Optimization: Slow से Sub-Millisecond तक

पिछले महीने मुझे रात 2 बजे Slack message आया: "Dashboard timeout हो रहा है।" मैंने production box में SSH किया, pg_stat_activity खोला, और एक single query मिली जो 12 results return करने के लिए 14 million rows scan कर रही थी। Fix एक partial index था जिसे बनाने में 30 seconds लगे। Query 4.2 seconds से 0.3 milliseconds हो गई।

PostgreSQL performance की बात ही यही है। Problems लगभग कभी exotic नहीं होतीं। Missed indexes, bad statistics, queries जो 10,000 rows वाली table पर सही थीं लेकिन अब 10 million हैं। Fixes usually simple हैं — एक बार जब पता हो कहां देखना है।

यह post production databases चलाने से PostgreSQL query optimization के बारे में मैंने जो कुछ सीखा है वो सब है। बिना practice के कोई theory नहीं। यहां हर technique ने real systems पर मेरा real time बचाया है।

Debugging Mindset: Guess मत करो, Measure करो#

Slow queries के साथ developers की सबसे बड़ी गलती guessing है। "शायद हमें उस column पर index चाहिए।" "शायद JOIN slow है।" "शायद हमें और RAM add करनी चाहिए।"

Guessing बंद करो। PostgreSQL के पास किसी भी database के सबसे अच्छे query analyzers में से एक है। इसे इस्तेमाल करो।

EXPLAIN — Blueprint#

Plain EXPLAIN दिखाता है कि PostgreSQL क्या plan करता है, बिना query actually run किए:

sql

EXPLAIN SELECT * FROM orders WHERE customer_id = 42;

                                    QUERY PLAN
---------------------------------------------------------------------------
 Index Scan using idx_orders_customer_id on orders  (cost=0.43..8.45 rows=1 width=128)
   Index Cond: (customer_id = 42)

यह बताता है plan — index scan — और estimated cost। लेकिन यह prediction है। PostgreSQL row count के बारे में पूरी तरह गलत हो सकता है। मैंने "1 row" के estimates देखे हैं जब actual result 50,000 rows था। इस तरह का misestimate terrible plan choices में cascade होता है।

Plain EXPLAIN तब इस्तेमाल करो जब query actually run किए बिना plan पर quick look चाहिए। यह matter करता है जब DELETE analyze कर रहे हो जो data modify करेगा, या query जो 30 minutes लेती है।

EXPLAIN ANALYZE — सच#

sql

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

                                    QUERY PLAN
---------------------------------------------------------------------------
 Index Scan using idx_orders_customer_id on orders  (cost=0.43..8.45 rows=1 width=128)
                                                    (actual time=0.021..0.023 rows=1 loops=1)
   Index Cond: (customer_id = 42)
 Planning Time: 0.085 ms
 Execution Time: 0.041 ms

अब actual execution time और actual row count दिखता है। असली debugging यहीं होती है। Estimated rows=1 को actual rows=1 से compare करो — match करते हैं, तो planner ने अच्छा decision लिया। जब match नहीं करते, वो पहला clue है।

Warning: EXPLAIN ANALYZE query actually run करता है। अगर UPDATE या DELETE analyze कर रहे हो, transaction में wrap करो:

sql

BEGIN;
EXPLAIN ANALYZE DELETE FROM sessions WHERE expires_at < now();
ROLLBACK;

EXPLAIN (ANALYZE, BUFFERS) — पूरी तस्वीर#

90% समय मैं यही actually इस्तेमाल करता हूं:

sql

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > '2026-01-01'
  AND o.status = 'completed';

                                    QUERY PLAN
---------------------------------------------------------------------------
 Hash Join  (cost=425.00..1893.25 rows=2150 width=48)
            (actual time=2.341..15.892 rows=2347 loops=1)
   Hash Cond: (o.customer_id = c.id)
   Buffers: shared hit=312 read=45
   ->  Bitmap Heap Scan on orders o  (cost=12.50..1450.75 rows=2150 width=32)
                                     (actual time=0.512..12.105 rows=2347 loops=1)
         Recheck Cond: (created_at > '2026-01-01'::date)
         Filter: (status = 'completed')
         Rows Removed by Filter: 893
         Heap Blocks: exact=189
         Buffers: shared hit=195 read=45
         ->  Bitmap Index Scan on idx_orders_created_at  (cost=0.00..11.96 rows=3043 width=0)
                                                         (actual time=0.287..0.287 rows=3240 loops=1)
               Index Cond: (created_at > '2026-01-01'::date)
               Buffers: shared hit=12
   ->  Hash  (cost=287.50..287.50 rows=10000 width=24)
             (actual time=1.753..1.753 rows=10000 loops=1)
         Buffers: shared hit=117
         ->  Seq Scan on customers c  (cost=0.00..287.50 rows=10000 width=24)
                                      (actual time=0.008..0.892 rows=10000 loops=1)
               Buffers: shared hit=117
 Planning Time: 0.215 ms
 Execution Time: 16.147 ms

BUFFERS output critical है। shared hit=312 मतलब 312 pages buffer cache (RAM) से आए। read=45 मतलब 45 pages disk से read करने पड़े। अगर hit के मुकाबले बहुत ज़्यादा read दिखे, तो shared_buffers बहुत छोटा हो सकता है, या table बहुत बड़ी है cached रहने के लिए।

मैं FORMAT JSON भी इस्तेमाल करता हूं जब plan explain.dalibo.com में visualization के लिए paste करना हो। Tree view complex plans को पढ़ना बहुत आसान बनाता है:

sql

EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT ...;

EXPLAIN Options जो मैं Actually इस्तेमाल करता हूं#

किसी भी slow query investigation के लिए मेरी पहली full diagnostic query यह है:

sql

EXPLAIN (ANALYZE, BUFFERS, TIMING, VERBOSE)
SELECT ...;

TIMING — per-node timing (ANALYZE के साथ default on है, लेकिन मैं explicit रहता हूं)
VERBOSE — output column lists और schema-qualified table names दिखाता है

Query Plans को Detective की तरह पढ़ना#

हर query plan एक tree है। PostgreSQL bottom-up पढ़ता है: सबसे deep indented nodes पहले execute होते हैं, results ऊपर feed करते हैं। Scan types समझना बाकी सब की foundation है।

Seq Scan — Full Table Scan#

Seq Scan on orders  (cost=0.00..25000.00 rows=1000000 width=128)

Sequential scan table की हर single row पढ़ता है। Million-row table के लिए, disk पर हर page।

लेकिन बारीकी यह है: Seq Scan हमेशा बुरा नहीं है। अगर table का 30% या ज़्यादा select कर रहे हो, sequential scan actually index scan से faster है क्योंकि sequential I/O random I/O से बहुत faster है। PostgreSQL यह जानता है। अगर index होने पर भी Seq Scan choose कर रहा है, check करो WHERE clause actually कितना selective है।

कब problem है: जब बड़ी table से rows का tiny fraction select कर रहे हो और कोई index नहीं है।

Index Scan — Targeted Lookup#

Index Scan using idx_orders_customer_id on orders  (cost=0.43..8.45 rows=5 width=128)
  Index Cond: (customer_id = 42)

B-tree index में जाता है, matching entries ढूंढता है, फिर actual rows table heap से fetch करता है। हर row fetch heap पर random I/O operation है। Highly selective queries के लिए great लेकिन हर heap fetch की cost है।

Index Only Scan — Holy Grail#

Index Only Scan using idx_orders_customer_total on orders  (cost=0.43..4.15 rows=5 width=12)
  Index Cond: (customer_id = 42)
  Heap Fetches: 0

सबसे fast scan type। PostgreSQL index से ही सब कुछ पा लेता है, table heap कभी touch नहीं करता। यह तब दिखता है जब SELECT और WHERE के सभी columns index में हैं।

पकड़: Heap Fetches: 0 मतलब visibility map up to date है। अगर table में बहुत dead tuples हैं (recently vacuum नहीं हुआ), PostgreSQL को अभी भी row visibility verify करने के लिए heap check करना पड़ता है। यह एक कारण है कि autovacuum performance के लिए matter करता है, सिर्फ disk space के लिए नहीं।

Bitmap Scan — बीच का रास्ता#

Bitmap Heap Scan on orders  (cost=45.00..1250.00 rows=5000 width=128)
  Recheck Cond: (created_at > '2026-01-01')
  ->  Bitmap Index Scan on idx_orders_created_at  (cost=0.00..43.75 rows=5000 width=0)
        Index Cond: (created_at > '2026-01-01')

Bitmap scans PostgreSQL का जवाब है "index scan के लिए बहुत ज़्यादा rows, seq scan के लिए बहुत कम" समस्या का। यह bitmap बनाता है कि कौन से pages में matching rows हैं, physical location से sort करता है, फिर order में fetch करता है। यह random I/O को sequential I/O में convert करता है।

अक्सर bitmap scans तब दिखते हैं जब दो या ज़्यादा indexes combine होते हैं:

Bitmap Heap Scan on orders
  Recheck Cond: ((status = 'completed') AND (created_at > '2026-01-01'))
  ->  BitmapAnd
        ->  Bitmap Index Scan on idx_orders_status
              Index Cond: (status = 'completed')
        ->  Bitmap Index Scan on idx_orders_created_at
              Index Cond: (created_at > '2026-01-01')

यह BitmapAnd दो separate indexes combine करता है। PostgreSQL कह रहा है "मेरे पास composite index नहीं है, लेकिन इन दो single-column indexes को combine कर सकता हूं।" काम करता है, लेकिन proper composite index ज़्यादा fast होगा।

Bad Estimates पकड़ना#

Query plan में नंबर एक चीज़ जो मैं देखता हूं — mismatched estimates:

Nested Loop  (cost=0.43..50.12 rows=1 width=128)
             (actual time=0.025..4521.337 rows=48753 loops=1)

Planner ने 1 row estimate किया। Actual result 48,753 rows था। पांच orders of magnitude का misestimate। Planner ने Nested Loop choose किया क्योंकि उसे लगा 1 row के against join कर रहा है। 48,753 rows के साथ, Hash Join orders of magnitude faster होता।

Bad estimates के common causes:

Stale statistics: Table पर ANALYZE run करो
Correlated columns: Planner assume करता है column values independent हैं। अगर status = 'shipped' और created_at > '2026-01-01' correlated हैं (most recent orders shipped हैं), planner combined selectivity underestimate करता है
WHERE में custom functions: Planner default selectivity estimate इस्तेमाल करता है (usually equality के लिए 0.5%, range के लिए 33%) जब function analyze नहीं कर सकता
Generic plans वाली parameterized queries: 5 executions के बाद, PostgreSQL generic plan पर switch कर सकता है जो actual parameter value consider नहीं करता

Bad estimates दिखने पर, fix usually इनमें से एक है: ANALYZE run करो, extended statistics बनाओ, query rewrite करो, या CTE optimization fence के रूप में इस्तेमाल करो।

sql

-- Correlated columns के लिए extended statistics बनाओ
CREATE STATISTICS orders_status_date (dependencies)
ON status, created_at FROM orders;
ANALYZE orders;

Index Strategy: सही Job के लिए सही Index#

Indexes free नहीं हैं। हर index writes slow करता है, disk space consume करता है, और maintain करना पड़ता है। Goal "सब कुछ index करो" नहीं है — "बिल्कुल वही index करो जो चाहिए।"

B-tree — Default Workhorse#

B-tree default है और अधिकांश cases handle करता है। Equality और range operators support करता है (=, <, >, <=, >=, BETWEEN, IN, IS NULL)।

sql

CREATE INDEX idx_orders_created_at ON orders (created_at);

Composite indexes में B-tree index column order बहुत matter करता है। Leftmost column primary sort है, फिर अगला, और इसी तरह। (a, b, c) पर index efficiently answer कर सकता है:

WHERE a = 1
WHERE a = 1 AND b = 2
WHERE a = 1 AND b = 2 AND c = 3
WHERE a = 1 AND b > 5
WHERE a = 1 ORDER BY b

यह efficiently answer नहीं कर सकता:

WHERE b = 2 (पहला column skip)
WHERE c = 3 (पहले दो columns skip)
WHERE a = 1 AND c = 3 (बीच में gap — a condition use होती है, c condition filter चाहिए)

इसे phone book की तरह सोचो जो last name से sorted है, फिर first name। आप तुरंत सभी "Smith" entries ढूंढ सकते हो, या specifically "Smith, John"। लेकिन पूरी book scan किए बिना सभी "John" entries तुरंत नहीं ढूंढ सकते।

नियम: equality conditions पहले रखो, फिर range conditions, फिर sort columns।

sql

-- Query: WHERE status = 'active' AND created_at > '2026-01-01' ORDER BY total DESC
 
-- अच्छा: equality पहले, फिर range, फिर sort
CREATE INDEX idx_orders_optimized ON orders (status, created_at, total);
 
-- बुरा: equality से पहले range
CREATE INDEX idx_orders_bad ON orders (created_at, status, total);

INCLUDE के साथ Covering Indexes#

PostgreSQL 11+ INCLUDE से index में non-key columns add करने देता है। ये columns leaf pages में store होते हैं लेकिन B-tree structure का हिस्सा नहीं। Tree bloat किए बिना index-only scans enable करते हैं:

sql

-- हम बार-बार run करते हैं:
-- SELECT id, total, status FROM orders WHERE customer_id = 42;
 
CREATE INDEX idx_orders_customer_covering
ON orders (customer_id) INCLUDE (total, status);

अब PostgreSQL entire query अकेले index से satisfy कर सकता है। INCLUDE के बिना, heap से total और status fetch करना पड़ता। Cold cache पर difference dramatic हो सकता है — मैंने queries 50ms से 0.2ms होते देखी हैं बस INCLUDE columns add करके।

Partial Indexes — सिर्फ वही Index करो जो Matter करता है#

यह मेरी single favorite PostgreSQL feature है और वो जो ज़्यादातर developers को पता नहीं।

sql

-- केवल 2% orders 'pending' हैं, लेकिन हम उन्हें लगातार query करते हैं
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending';

यह index created_at पर full index के मुकाबले बहुत छोटा है। इसमें सिर्फ वो rows हैं जहां status = 'pending'। WHERE clause match करने वाली queries यह छोटा, fast index इस्तेमाल करती हैं। Match न करने वाली queries ignore करती हैं।

Production से real example: sessions table 50 million rows। सिर्फ करीब 200,000 active (expire नहीं हुई)। user_id पर full index 1.2 GB था। Partial index WHERE expires_at > now() 8 MB था। Query 12ms से 0.1ms हो गई क्योंकि पूरा index cache में fit हो गया।

sql

-- एक और common pattern: soft deletes
CREATE INDEX idx_users_active_email ON users (email)
WHERE deleted_at IS NULL;
 
-- सिर्फ active records पर unique constraint
CREATE UNIQUE INDEX idx_users_unique_email ON users (email)
WHERE deleted_at IS NULL;

वो आखिरी वाला incredibly useful है। Active users पर unique email constraint रखने देता है जबकि deleted records में same email appear हो सकता है।

GIN Indexes — Full-Text Search और JSONB#

GIN (Generalized Inverted Index) जवाब है जब values के अंदर search करना हो — arrays, JSONB documents, या full-text search vectors।

sql

-- JSONB containment queries
CREATE INDEX idx_products_metadata ON products USING GIN (metadata);
 
-- अब यह fast है:
SELECT * FROM products WHERE metadata @> '{"color": "red"}';
 
-- Full-text search
CREATE INDEX idx_articles_search ON articles USING GIN (to_tsvector('english', title || ' ' || body));

GIN indexes बड़े और build करने में slow हैं, लेकिन query करने में blazing fast। Complex containment queries वाले read-heavy workloads के लिए worth it हैं।

JSONB के लिए, jsonb_path_ops भी है जो @> containment queries के लिए छोटा और faster है, लेकिन existence operators (?, ?|, ?&) support नहीं करता:

sql

CREATE INDEX idx_products_metadata_pathops
ON products USING GIN (metadata jsonb_path_ops);

GiST Indexes — Geometric और Range Types#

GiST (Generalized Search Tree) overlapping data types handle करता है: geometric shapes, ranges, full-text search (GIN का alternative, छोटा लेकिन slower)।

sql

-- IP range lookups
CREATE INDEX idx_ip_ranges ON ip_blocks USING GiST (ip_range);
 
SELECT * FROM ip_blocks WHERE ip_range @> '192.168.1.50'::inet;
 
-- Temporal ranges (scheduling conflicts)
CREATE INDEX idx_bookings_period ON bookings USING GiST (
  tstzrange(start_time, end_time)
);
 
-- Overlapping bookings ढूंढो:
SELECT * FROM bookings
WHERE tstzrange(start_time, end_time) && tstzrange('2026-03-01 10:00', '2026-03-01 12:00');

Hash Indexes — Niche लेकिन Useful#

Hash indexes सिर्फ equality comparisons के लिए useful हैं। PostgreSQL 10 से WAL-logged और crash-safe हैं। Wide columns के लिए B-tree से छोटे और pure equality के लिए थोड़ा faster:

sql

-- बड़े text columns पर equality-only lookups के लिए अच्छे
CREATE INDEX idx_sessions_token ON sessions USING HASH (session_token);
 
-- यह fast है:
SELECT * FROM sessions WHERE session_token = 'abc123def456...';
 
-- यह hash index इस्तेमाल नहीं कर सकता:
SELECT * FROM sessions WHERE session_token LIKE 'abc%';

Practice में, मैं hash indexes कम ही इस्तेमाल करता हूं। B-tree equality ठीक से handle करता है, और range queries या ordering की inability hash indexes को ज़्यादातर real use cases के लिए बहुत inflexible बनाती है।

जब Indexes नुकसान करते हैं#

हर index की cost है:

Write amplification: हर INSERT table के हर index update करता है। 8 indexes वाली table मतलब प्रति INSERT 8 additional writes
HOT updates blocked: Heap-Only Tuple (HOT) updates major optimization है जहां PostgreSQL बिना indexes update किए row update कर सकता है, लेकिन तभी जब कोई indexed column change न हो। ज़्यादा indexes = HOT updates block होने के ज़्यादा chances
Vacuum overhead: ज़्यादा indexes मतलब vacuum ज़्यादा समय लेता है
Planner overhead: ज़्यादा indexes मतलब planner के पास evaluate करने के ज़्यादा options

मैं regularly production tables पर indexes audit करता हूं:

sql

-- Unused indexes ढूंढो
SELECT
  schemaname, tablename, indexname,
  idx_scan,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelid NOT IN (
    SELECT conindid FROM pg_constraint
    WHERE contype IN ('p', 'u')  -- primary key और unique constraints exclude
  )
ORDER BY pg_relation_size(indexrelid) DESC;

मैंने multi-gigabyte indexes पाए हैं जो one-time migration के दौरान बनाए गए और फिर कभी इस्तेमाल नहीं हुए। उन्हें drop करने से writes noticeably faster हुई।

sql

-- Duplicate indexes ढूंढो (same columns, same order)
SELECT
  a.indexrelid::regclass AS index1,
  b.indexrelid::regclass AS index2,
  a.indrelid::regclass AS table_name
FROM pg_index a
JOIN pg_index b ON a.indrelid = b.indrelid
  AND a.indexrelid < b.indexrelid
  AND a.indkey = b.indkey;

N+1 Query Problem#

हर ORM developer eventually इसे hit करता है। यह सबसे common performance problem है जो मैं debug करता हूं।

N+1 कैसा दिखता है#

python

# Python / SQLAlchemy — classic trap
orders = session.query(Order).filter_by(status='pending').all()
 
for order in orders:
    # यह हर ORDER के लिए separate query trigger करता है
    print(f"Customer: {order.customer.name}")

यह generate करता है:

sql

-- Query 1: सभी pending orders लाओ
SELECT * FROM orders WHERE status = 'pending';
 
-- Query 2: Order 1 का customer लाओ
SELECT * FROM customers WHERE id = 101;
 
-- Query 3: Order 2 का customer लाओ
SELECT * FROM customers WHERE id = 102;
 
-- ...Query 502: Order 500 का customer लाओ
SELECT * FROM customers WHERE id = 600;

1 या 2 की बजाय 501 queries। हर query individually fast है, शायद 0.5ms। लेकिन 501 queries मिलकर 250ms सिर्फ database time बनता है, plus हर एक के लिए network round-trip latency।

Logs में N+1 Detect करना#

N+1 queries पकड़ने का सबसे तेज़ तरीका temporarily statement logging enable करना है:

sql

-- 0ms से ज़्यादा लेने वाली सभी queries log करो (यानी, सभी queries)
SET log_min_duration_statement = 0;
 
-- Production के लिए बेहतर: सिर्फ slow queries log करो
ALTER SYSTEM SET log_min_duration_statement = 50;  -- 50ms threshold
SELECT pg_reload_conf();

फिर logs देखो। N+1 unmistakable है — तेज़ succession में अलग-अलग parameter values वाली सैकड़ों identical queries दिखेंगी।

Development के लिए ज़्यादा targeted approach:

sql

-- Slow queries के लिए auto_explain enable करो
LOAD 'auto_explain';
SET auto_explain.log_min_duration = 100;  -- 100ms
SET auto_explain.log_nested_statements = ON;
SET auto_explain.log_analyze = ON;

यह 100ms से ज़्यादा लेने वाली किसी भी query का full EXPLAIN ANALYZE output log करता है, functions के अंदर queries सहित।

Fix: Eager Loading या JOINs#

ORM approach — ORM को बताओ related data पहले से load करो:

python

# SQLAlchemy — joinedload same query में customers fetch करता है
from sqlalchemy.orm import joinedload
 
orders = (
    session.query(Order)
    .options(joinedload(Order.customer))
    .filter_by(status='pending')
    .all()
)

Raw SQL approach — बस JOIN इस्तेमाल करो:

sql

SELECT o.*, c.name AS customer_name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'pending';

एक query। हो गया।

JOINs vs Multiple Queries#

यहां valid debate है। कभी-कभी दो queries JOIN से बेहतर हैं:

sql

-- Approach 1: Single JOIN (one-to-many में duplicate data produce कर सकता है)
SELECT o.*, i.product_name, i.quantity
FROM orders o
JOIN order_items i ON i.order_id = o.id
WHERE o.status = 'pending';
 
-- Approach 2: दो queries (अगर orders में बहुत columns हैं तो कम data transfer)
SELECT * FROM orders WHERE status = 'pending';
-- application order_ids collect करती है = [1, 2, 3, ...]
SELECT * FROM order_items WHERE order_id = ANY($1);

JOIN approach हर item के लिए order data duplicate करता है। अगर हर order में 20 items हैं और order row wide है, तो 20x data transfer। दो-query approach हर order exactly एक बार भेजता है।

मेरा नियम: one-to-one relationships के लिए JOINs इस्तेमाल करो, one-to-many के लिए separate queries consider करो जब "one" side wide हो। लेकिन हमेशा benchmark करो — दूसरी query का network round-trip अक्सर duplicate data से ज़्यादा cost करता है।

Common Query Rewrites#

कुछ queries slow इसलिए नहीं कि indexes missing हैं बल्कि कैसे लिखी गई हैं। PostgreSQL का planner अच्छा है, लेकिन जादू नहीं है।

Subquery vs JOIN vs CTE#

ये तीन approaches बहुत different plans produce कर सकते हैं:

sql

-- WHERE में subquery — अक्सर ठीक, कभी-कभी terrible
SELECT * FROM orders
WHERE customer_id IN (
  SELECT id FROM customers WHERE country = 'US'
);
 
-- JOIN — usually सबसे अच्छी choice
SELECT o.* FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE c.country = 'US';
 
-- CTE — PG 11 और below में materialized, PG 12+ में optimized
WITH us_customers AS (
  SELECT id FROM customers WHERE country = 'US'
)
SELECT o.* FROM orders o
JOIN us_customers uc ON uc.id = o.customer_id;

PostgreSQL 12+ में, CTEs usually inlined होते हैं (planner उन्हें subqueries जैसा treat करता है), तो performance identical है। लेकिन PostgreSQL 11 और below में, CTEs optimization fences हैं — planner उन्हें materialize करता है और predicates push through नहीं कर सकता। अगर अभी भी PG 11 पर हो, performance-critical queries के लिए CTEs avoid करो।

EXISTS vs IN vs JOIN#

यह बार-बार आता है:

sql

-- EXISTS — पहले match पर रुक जाता है, existence check के लिए great
SELECT * FROM orders o
WHERE EXISTS (
  SELECT 1 FROM returns r WHERE r.order_id = o.id
);
 
-- IN subquery के साथ — पहले पूरा result set बनाता है
SELECT * FROM orders
WHERE id IN (SELECT order_id FROM returns);
 
-- JOIN — अगर returns में per order multiple rows हैं तो duplicates produce कर सकता है
SELECT DISTINCT o.* FROM orders o
JOIN returns r ON r.order_id = o.id;

बड़ी tables के लिए, EXISTS अक्सर जीतता है क्योंकि short-circuit करता है। IN version को filtering से पहले return order_ids की पूरी list बनानी पड़ती है। JOIN version duplicates produce कर सकता है, जिसके लिए DISTINCT चाहिए जो sort या hash step add करता है।

मेरा default: related rows की existence check करते समय EXISTS इस्तेमाल करो। Semantically सबसे clear और usually सबसे fast।

लेकिन एक counterexample है। अगर subquery result set छोटा है और multiple conditions के लिए चाहिए:

sql

-- अगर subquery कम rows return करता है, IN बिल्कुल ठीक है
SELECT * FROM orders
WHERE customer_id IN (1, 2, 3, 4, 5);

SELECT * से बचो#

यह सिर्फ code style preference नहीं है। Real performance implications हैं:

sql

-- बुरा: सभी 25 columns fetch करता है जिसमें kilobytes data वाला TEXT column शामिल
SELECT * FROM products WHERE category = 'electronics';
 
-- अच्छा: सिर्फ वही fetch करो जो चाहिए
SELECT id, name, price FROM products WHERE category = 'electronics';

SELECT * के साथ:

Index-only scans इस्तेमाल नहीं हो सकते (सभी columns index में होने चाहिए)
Network पर ज़्यादा data transfer
Sorting और hashing के लिए ज़्यादा memory
अगर कोई बाद में 10MB BYTEA column add करे, existing queries silently slower हो जाती हैं

Window Functions vs Subqueries#

Window functions PostgreSQL की सबसे powerful features में से हैं और लगभग हमेशा correlated subqueries से outperform करती हैं:

sql

-- Slow: correlated subquery हर row के लिए एक बार run होती है
SELECT
  o.*,
  (SELECT COUNT(*) FROM order_items i WHERE i.order_id = o.id) AS item_count,
  (SELECT SUM(amount) FROM payments p WHERE p.order_id = o.id) AS total_paid
FROM orders o
WHERE o.status = 'completed';
 
-- Fast: window functions single pass में compute करती हैं
SELECT
  o.*,
  COUNT(i.id) OVER (PARTITION BY o.id) AS item_count,
  SUM(p.amount) OVER (PARTITION BY o.id) AS total_paid
FROM orders o
LEFT JOIN order_items i ON i.order_id = o.id
LEFT JOIN payments p ON p.order_id = o.id
WHERE o.status = 'completed';

एक और common pattern — per group latest row लाना:

sql

-- Slow: correlated subquery
SELECT * FROM notifications n
WHERE n.created_at = (
  SELECT MAX(created_at) FROM notifications
  WHERE user_id = n.user_id
);
 
-- Fast: DISTINCT ON (PostgreSQL-specific, बहुत useful)
SELECT DISTINCT ON (user_id) *
FROM notifications
ORDER BY user_id, created_at DESC;
 
-- यह भी fast: ROW_NUMBER window function
SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn
  FROM notifications
) sub
WHERE rn = 1;

DISTINCT ON इस pattern के लिए मेरा go-to है। Concise, readable, और PostgreSQL सही index से अच्छे से optimize करता है:

sql

CREATE INDEX idx_notifications_user_latest
ON notifications (user_id, created_at DESC);

Pagination सही तरीके से#

बड़े datasets के लिए OFFSET trap है:

sql

-- Page 1: fast
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 0;
 
-- Page 500: slow — PostgreSQL को 9,980 rows scan और discard करनी पड़ती हैं
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 9980;
 
-- Page 5000: बहुत slow — 20 return करने के लिए 99,980 rows scan
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 99980;

Fix keyset pagination है (cursor-based pagination भी कहते हैं):

sql

-- First page
SELECT * FROM products ORDER BY id LIMIT 20;
 
-- Next page: पिछले page की last id इस्तेमाल करो
SELECT * FROM products WHERE id > 12345 ORDER BY id LIMIT 20;

यह हमेशा fast है क्योंकि index से directly सही position पर jump करता है, चाहे कोई भी "page" हो। Trade-off यह है कि arbitrary page number पर jump नहीं कर सकते, लेकिन infinite scroll या "next page" UIs के लिए, keyset pagination strictly superior है।

Complex sort orders के लिए:

sql

-- Multiple sort columns के साथ keyset pagination
SELECT * FROM products
WHERE (price, id) > (29.99, 12345)
ORDER BY price, id
LIMIT 20;

यह row-value comparison इस्तेमाल करता है, जो PostgreSQL (price, id) पर composite index से efficiently handle करता है।

Table Statistics और Vacuuming#

PostgreSQL का query planner आपके data के statistics पर based decisions लेता है। Bad statistics lead to bad plans। इतना simple है।

ANALYZE: Statistics Update करो#

sql

-- Single table analyze करो
ANALYZE orders;
 
-- पूरा database analyze करो
ANALYZE;
 
-- Column की current statistics देखो
SELECT
  tablename, attname, n_distinct, most_common_vals, correlation
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'status';

correlation value particularly interesting है। -1 से 1 तक range करती है और measure करती है कि rows का physical order column के logical order से कितना match करता है। 1 या -1 के करीब correlation मतलब data physically उस column पर sorted है, range scans बहुत efficient बनाता है (sequential I/O)। 0 के करीब correlation मतलब range queries के लिए random I/O।

sql

-- Skewed distributions वाले columns के लिए statistics target बढ़ाओ
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 1000;
ANALYZE orders;

Default statistics target 100 है, मतलब PostgreSQL 300 * 100 = 30,000 rows sample करता है। बहुत distinct values या skewed distributions वाले columns के लिए, इसे 500 या 1000 तक बढ़ाने से planner को बेहतर data मिलता है slightly longer ANALYZE times की cost पर।

Autovacuum Tuning#

Autovacuum दो काम करता है: dead tuples (deleted या updated rows) से space reclaim करता है और statistics update करता है। Busy tables पर, default autovacuum settings अक्सर बहुत conservative हैं।

sql

-- Autovacuum status check करो
SELECT
  relname,
  n_live_tup,
  n_dead_tup,
  last_vacuum,
  last_autovacuum,
  last_analyze,
  last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

अगर tables दिखें जिनमें millions dead tuples हैं और last vacuum घंटों पहले था, autovacuum पीछे रह रहा है।

High-churn tables (sessions, job queues, या metrics जैसी) के लिए, मैं per-table autovacuum settings set करता हूं:

sql

-- High-write tables के लिए ज़्यादा aggressive autovacuum
ALTER TABLE sessions SET (
  autovacuum_vacuum_scale_factor = 0.01,     -- 1% dead tuples पर vacuum (default 20%)
  autovacuum_analyze_scale_factor = 0.005,   -- 0.5% changes पर analyze (default 10%)
  autovacuum_vacuum_cost_delay = 2           -- faster vacuum (newer PG में default 2ms)
);

Table Bloat#

जब PostgreSQL row update करता है, row in place modify नहीं करता — new version बनाता है और old one dead mark करता है। Vacuum dead rows reclaim करता है, लेकिन space सिर्फ उसी table reuse करती है। Disk पर table file shrink नहीं होती।

समय के साथ, heavily updated table में significant bloat हो सकता है — table disk पर live data की ज़रूरत से बहुत बड़ी है। इसका मतलब ज़्यादा pages scan करने, ज़्यादा I/O, ज़्यादा cache pressure।

sql

-- Table bloat estimate करो (simplified version)
SELECT
  current_database(),
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
  pg_size_pretty(
    pg_total_relation_size(schemaname || '.' || tablename) -
    pg_relation_size(schemaname || '.' || tablename)
  ) AS index_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 20;

Severely bloated tables के लिए, nuclear option VACUUM FULL है — पूरी table rewrite करता है। लेकिन exclusive lock लेता है, तो downtime बिना live production table पर नहीं कर सकते। Better approach pg_repack है, जो same काम बिना locking करता है:

bash

pg_repack --table orders --no-kill-backend -d mydb

Connection Pooling#

बहुत से developers को यह surprise करता है: PostgreSQL connections expensive हैं। हर connection new process spawn करता है (thread नहीं), करीब 5-10 MB memory consume करता है, और nontrivial fork overhead है।

Default max_connections 100 है। अगर application server में 20 workers हैं, हर एक 5 connections open करता है, आप already limit पर हो। Background job processor, monitoring tool, और कहीं running migration add करो, और musibat में हो।

Connection Pooler क्यों चाहिए#

Pooler के बिना, अगर application को 500 concurrent requests handle करने हैं, 500 PostgreSQL connections चाहिए। सिर्फ connection overhead के लिए 5 GB memory, और कुछ सौ connections के बाद process management overhead से PostgreSQL की performance significantly degrade होती है।

PgBouncer PostgreSQL के सामने रखने से, वो 500 application connections maybe 20 actual PostgreSQL connections पर map होते हैं। जब सभी database connections busy हों तो pooler requests queue करता है।

PgBouncer Configuration#

ini

[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb
 
[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
 
; Pool mode
pool_mode = transaction
 
; Pool sizing
default_pool_size = 20
max_client_conn = 500
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3
 
; Timeouts
server_idle_timeout = 300
client_idle_timeout = 0
query_timeout = 30

Transaction vs Session Mode#

Transaction mode (pool_mode = transaction): हर transaction के बाद connections pool में return होते हैं। 95% समय आपको यही चाहिए। Maximum connection reuse।
Session mode (pool_mode = session): पूरे client session तक connections hold रहते हैं। अगर prepared statements, SET commands, LISTEN/NOTIFY, या दूसरे session-level features चाहिए तो यह इस्तेमाल करो।
Statement mode (pool_mode = statement): हर statement के बाद connections return होते हैं। ज़्यादातर applications के लिए बहुत restrictive — explicit transactions भी इस्तेमाल नहीं कर सकते।

Transaction mode की पकड़: prepared statements इस्तेमाल नहीं कर सकते (session-level state हैं), session variables के लिए SET नहीं कर सकते, और LISTEN/NOTIFY expected तरीके से काम नहीं करेगा। ज़्यादातर ORMs में prepared statements disable करने का तरीका है।

Node.js में pg driver के साथ:

javascript

const pool = new Pool({
  host: '127.0.0.1',
  port: 6432,  // PgBouncer port, PostgreSQL port नहीं
  database: 'mydb',
  max: 20,     // PgBouncer default_pool_size से match करता है
  statement_timeout: 30000,
  // PgBouncer transaction mode के लिए prepared statements disable करो
  prepare: false,
});

Pool Sizing Formula#

Optimal PostgreSQL connection pool size का common formula है:

optimal_connections = (core_count * 2) + effective_spindle_count

4 cores और SSD (1 effective spindle) वाले modern server के लिए:

optimal = (4 * 2) + 1 = 9

Counterintuitively कम लगता है। लेकिन PostgreSQL ज़्यादातर modern hardware पर CPU-bound है (SSDs इतने fast हैं कि I/O rarely bottleneck करता है)। इससे ज़्यादा connections context switching overhead पैदा करते हैं जो actually चीज़ों को slow करता है।

Practice में, मैं usually default_pool_size इसका 2-3x set करता हूं burst traffic handle करने के लिए, समझते हुए कि peak load पर कुछ queries PgBouncer queue में wait करेंगी बजाय सब एक साथ PostgreSQL hit करने के।

Practical Checklist: हर Slow Query के लिए Exact Steps#

"यह query slow है" report मिलने पर मेरी actual process यह है। ये steps इसी order में, हर बार follow करता हूं।

Step 1: Actual Query लो#

"Endpoint slow है" नहीं — actual SQL। अगर ORM इस्तेमाल कर रहे हो, query logging enable करो:

sql

-- Temporarily 100ms से ज़्यादा वाली सभी queries log करो
ALTER SYSTEM SET log_min_duration_statement = 100;
SELECT pg_reload_conf();

या total time से top queries के लिए pg_stat_statements check करो:

sql

-- Total execution time से top 10 queries
SELECT
  queryid,
  calls,
  round(total_exec_time::numeric, 2) AS total_time_ms,
  round(mean_exec_time::numeric, 2) AS avg_time_ms,
  round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS pct,
  query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

pg_stat_statements PostgreSQL performance के लिए single most valuable extension है। अगर run नहीं कर रहे, अभी install करो:

sql

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Step 2: EXPLAIN (ANALYZE, BUFFERS) Run करो#

sql

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) <slow query>;

ढूंढो:

Bad row estimates — actual rows estimated rows से बहुत different
बड़ी tables पर Seq Scans — potential missing index
बहुत rows वाले Nested Loops — Hash Join या Merge Join होना चाहिए
High buffer reads — cold cache या table बहुत बड़ी
Disk पर spill होने वाली Sort operations — work_mem बढ़ाओ या sort के लिए index add करो

Step 3: Table Statistics Check करो#

sql

-- Stats last कब update हुए?
SELECT
  relname,
  last_analyze,
  last_autoanalyze,
  n_live_tup,
  n_dead_tup
FROM pg_stat_user_tables
WHERE relname = 'orders';

अगर last_analyze पुराना है या n_dead_tup n_live_tup के मुकाबले high है, run करो:

sql

ANALYZE orders;

फिर EXPLAIN ANALYZE फिर से run करो। अगर plan बदलता है, stale statistics problem थे।

Step 4: Existing Indexes Check करो#

sql

-- इस table पर कौन से indexes हैं?
SELECT
  indexname,
  indexdef,
  idx_scan,
  idx_tup_read,
  pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
JOIN pg_indexes ON pg_stat_user_indexes.indexrelname = pg_indexes.indexname
WHERE tablename = 'orders'
ORDER BY idx_scan DESC;

शायद index exist करता है लेकिन इस्तेमाल नहीं हो रहा। शायद query के लिए column order गलत है।

Step 5: Indexes बनाओ या Modify करो#

Query plan के आधार पर, appropriate index बनाओ। पहले और बाद में EXPLAIN ANALYZE से test करो।

sql

-- Index concurrently बनाओ (table lock नहीं करता)
CREATE INDEX CONCURRENTLY idx_orders_customer_status
ON orders (customer_id, status)
WHERE status IN ('pending', 'processing');
 
-- Verify करो index इस्तेमाल हो रहा है
EXPLAIN (ANALYZE, BUFFERS) <slow query>;

Production tables पर हमेशा CONCURRENTLY इस्तेमाल करो। Regular CREATE INDEX full table lock लेता है जो सभी writes block करता है।

Step 6: Query Rewrites Consider करो#

अगर index exist करता है और statistics fresh हैं लेकिन query अभी भी slow है, query खुद देखो:

क्या subquery JOIN में rewrite हो सकती है?
क्या OFFSET problem cause कर रहा है? Keyset pagination पर switch करो
ज़रूरत से ज़्यादा columns select कर रहे हो?
क्या correlated subquery window function बन सकती है?
क्या CTE planner को optimize करने से रोक रहा है?

Step 7: Server Settings Check करो#

Specific query patterns के लिए, server settings matter करती हैं:

sql

-- work_mem: sorts और hash joins के लिए memory (per operation, per connection नहीं)
-- Default 4MB है, जो complex queries के लिए बहुत कम है
SET work_mem = '64MB';  -- यह try करो और EXPLAIN ANALYZE फिर से run करो
 
-- effective_cache_size: planner को बताता है कितना disk cache expect करे
-- Total RAM का ~75% set करो
SHOW effective_cache_size;
 
-- random_page_cost: random I/O से sequential I/O का ratio
-- Default 4.0 है, SSDs के लिए 1.1-1.5 set करो
SHOW random_page_cost;

अगर work_mem बदलने से sort-to-disk problem fix होती है, globally बढ़ाने पर विचार करो। लेकिन सावधान — per-operation है, per-connection नहीं। 10 sort operations और work_mem = 256MB वाली complex query single query के लिए 2.5 GB इस्तेमाल कर सकती है।

Step 8: Fix के बाद Monitor करो#

बस fix करके भूलो मत। Verify करो fix hold करती है:

sql

-- Changes के बाद fresh data देखने के लिए pg_stat_statements reset करो
SELECT pg_stat_statements_reset();
 
-- एक घंटे/दिन बाद check करो
SELECT
  queryid,
  calls,
  round(mean_exec_time::numeric, 2) AS avg_time_ms,
  round(max_exec_time::numeric, 2) AS max_time_ms,
  query
FROM pg_stat_statements
WHERE query LIKE '%orders%'
ORDER BY mean_exec_time DESC;

Bonus: Quick Wins जो मैं हर नए Database पर Apply करता हूं#

ये settings और practices मैं हर PostgreSQL database पर setup करता हूं, कोई performance issues आने से पहले:

sql

-- 1. pg_stat_statements install करो
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
 
-- 2. Slow query logging enable करो (50ms मेरा threshold है)
ALTER SYSTEM SET log_min_duration_statement = 50;
 
-- 3. SSD-appropriate settings
ALTER SYSTEM SET random_page_cost = 1.1;
ALTER SYSTEM SET effective_io_concurrency = 200;
 
-- 4. Anemic default से work_mem बढ़ाओ
ALTER SYSTEM SET work_mem = '32MB';
 
-- 5. effective_cache_size RAM का 75% set करो
-- 16GB server के लिए:
ALTER SYSTEM SET effective_cache_size = '12GB';
 
-- 6. shared_buffers RAM का 25% set करो
-- 16GB server के लिए:
ALTER SYSTEM SET shared_buffers = '4GB';
 
-- 7. Autovacuum responsiveness improve करो
ALTER SYSTEM SET autovacuum_naptime = '10s';
 
-- Changes apply करो
SELECT pg_reload_conf();
-- Note: shared_buffers restart चाहिए

और monitoring query जो मैं weekly run करता हूं:

sql

-- जिन tables को attention चाहिए
SELECT
  schemaname || '.' || relname AS table,
  pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
  n_live_tup,
  n_dead_tup,
  CASE WHEN n_live_tup > 0
    THEN round(100.0 * n_dead_tup / n_live_tup, 1)
    ELSE 0
  END AS dead_pct,
  last_autovacuum,
  last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;

अंतिम विचार#

PostgreSQL query optimization dark art नहीं है। यह systematic process है:

Measure करो — guess मत करो। EXPLAIN (ANALYZE, BUFFERS) आपका best friend है।
Plan समझो — scan types, join types, और row estimates पढ़ना सीखो।
Strategically index करो — right columns पर right order में right index। Partial indexes और covering indexes underused superpowers हैं।
Better queries लिखो — existence checks के लिए EXISTS over IN, OFFSET की जगह keyset pagination, correlated subqueries की जगह window functions।
Database maintain करो — autovacuum tuning, statistics updates, connection pooling।
Continuously monitor करो — pg_stat_statements बताता है database अपना time कहां खर्च करता है। Regularly check करो।

4 seconds लेने वाली query और 0.3 milliseconds लेने वाली query के बीच फर्क rarely hardware है। लगभग हमेशा knowledge है — जानना कहां देखना है और क्या बदलना है। और अब आपको पता है कहां देखना है।