Window functions are the single most underused SQL feature. Running totals, rankings, moving averages, gap analysis, and session detection — all without subqueries. Real examples from production databases.
I spent five years writing SQL before I truly learned window functions. Five years of correlated subqueries, self-joins, temporary tables, and application-layer gymnastics to solve problems that window functions handle in a single, readable expression. When I finally sat down and learned them properly, I had a genuine "why did nobody tell me about this" moment. I went back to old code and rewrote queries that had been bothering me for years. Half of them got shorter. Most of them got faster. A few went from "we run this overnight in a batch job" to "this returns in 400 milliseconds."
Window functions are the most powerful feature in SQL that most developers skip. They show up in interview prep, get filed under "advanced," and then ignored in day-to-day work. That's a mistake. Once you understand them, you'll reach for them constantly — running totals, rankings, moving averages, detecting gaps, sessionizing user events, computing year-over-year growth. All without subqueries. All in a single pass over the data.
This post is everything I wish someone had told me about window functions five years ago. Not the documentation. The real patterns, the real use cases, and the performance traps that don't show up in tutorials.
Here's the conceptual shift that took me too long to grasp: a regular aggregate function like SUM() or COUNT() collapses rows. You put in 1,000 rows, you get back 1 (or a few, with GROUP BY). A window function computes across a set of rows but keeps every row in the result. That's it. That's the whole idea.
-- Regular aggregate: collapses rows
SELECT department, SUM(salary)
FROM employees
GROUP BY department;
-- Result: one row per department
-- Window function: keeps all rows, adds the computation
SELECT name, department, salary,
SUM(salary) OVER (PARTITION BY department) as dept_total
FROM employees;
-- Result: every employee row, with their department's total attachedThe OVER() clause is what makes it a window function. Everything inside that clause defines the "window" — which rows the function can see when computing the result for the current row.
Think of it like this: for each row, the database opens a window onto a subset of the data, runs the function across that window, and writes the result back onto the current row. The window slides. The rows stay.
Every window function has the same structure:
function_name(...) OVER (
[PARTITION BY column1, column2, ...]
[ORDER BY column3, column4, ...]
[frame_clause]
)PARTITION BY splits the data into independent groups (like GROUP BY, but without collapsing). Each partition is processed separately.
ORDER BY defines the order within each partition. This matters a lot — it affects what LAG/LEAD return, how ROW_NUMBER assigns numbers, and critically, it changes the default frame for aggregate window functions.
Frame clause defines exactly which rows within the partition the function can see for the current row. This is where most people get confused, and where most bugs hide. More on this later.
Most developers encounter window functions through ranking. You need the "top 3 products per category" or "the most recent order per customer." This is where ROW_NUMBER, RANK, and DENSE_RANK live.
Assigns a unique sequential integer to each row within a partition:
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rank
FROM employees;name | department | salary | rank
------------|------------|---------|-----
Alice | Engineering| 145000 | 1
Bob | Engineering| 132000 | 2
Carol | Engineering| 128000 | 3
Dave | Engineering| 115000 | 4
Eve | Sales | 98000 | 1
Frank | Sales | 92000 | 2
Grace | Sales | 88000 | 3
The classic pattern — "top N per group" — uses this in a subquery or CTE:
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rn
FROM employees
)
SELECT * FROM ranked WHERE rn <= 3;This is one of the most common SQL interview questions, and also one of the most common real-world patterns. Before window functions, you'd solve this with a correlated subquery:
-- The old way (don't do this)
SELECT * FROM employees e1
WHERE (
SELECT COUNT(*) FROM employees e2
WHERE e2.department = e1.department
AND e2.salary > e1.salary
) < 3;That correlated subquery runs once per row in the outer query. On a table with 100,000 employees, that's 100,000 subqueries. The window function version does a single pass with a sort per partition.
The difference only matters when there are ties:
SELECT
name, score,
ROW_NUMBER() OVER (ORDER BY score DESC) as row_num,
RANK() OVER (ORDER BY score DESC) as rank,
DENSE_RANK() OVER (ORDER BY score DESC) as dense_rank
FROM players;name | score | row_num | rank | dense_rank
--------|-------|---------|------|----------
Alice | 100 | 1 | 1 | 1
Bob | 100 | 2 | 1 | 1
Carol | 95 | 3 | 3 | 2
Dave | 90 | 4 | 4 | 3
Eve | 90 | 5 | 4 | 3
Frank | 85 | 6 | 6 | 4
My rule of thumb: use ROW_NUMBER when you need exactly N results (pagination, dedup). Use RANK for competition-style rankings. Use DENSE_RANK when you want "what percentile bucket is this in" and gaps in numbering would mess up downstream logic.
NTILE divides the partition into N roughly equal buckets:
SELECT
name, salary,
NTILE(4) OVER (ORDER BY salary) as quartile
FROM employees;name | salary | quartile
--------|---------|--------
Dave | 55000 | 1
Eve | 62000 | 1
Carol | 71000 | 2
Bob | 85000 | 2
Alice | 92000 | 3
Frank | 98000 | 3
Grace | 115000 | 4
Henry | 145000 | 4
I use this constantly for creating distribution buckets — salary quartiles, performance tiers, customer segments. The "roughly equal" part matters: if you have 10 rows and ask for 4 buckets, you get groups of 3, 3, 2, 2. Not always perfectly even.
This is where window functions start getting genuinely exciting. LAG looks at the previous row. LEAD looks at the next row. Within the partition, according to the ORDER BY.
SELECT
date,
revenue,
LAG(revenue, 1) OVER (ORDER BY date) as prev_day_revenue,
LEAD(revenue, 1) OVER (ORDER BY date) as next_day_revenue
FROM daily_revenue;date | revenue | prev_day_revenue | next_day_revenue
-----------|---------|------------------|------------------
2026-01-01 | 12500 | NULL | 13200
2026-01-02 | 13200 | 12500 | 11800
2026-01-03 | 11800 | 13200 | 14100
2026-01-04 | 14100 | 11800 | 13900
2026-01-05 | 13900 | 14100 | NULL
The second argument is the offset (default 1). You can also provide a third argument as a default value instead of NULL:
LAG(revenue, 1, 0) OVER (ORDER BY date) -- returns 0 instead of NULLSELECT
date,
revenue,
revenue - LAG(revenue) OVER (ORDER BY date) as daily_change,
ROUND(
100.0 * (revenue - LAG(revenue) OVER (ORDER BY date))
/ LAG(revenue) OVER (ORDER BY date),
2
) as pct_change
FROM daily_revenue;date | revenue | daily_change | pct_change
-----------|---------|--------------|----------
2026-01-01 | 12500 | NULL | NULL
2026-01-02 | 13200 | 700 | 5.60
2026-01-03 | 11800 | -1400 | -10.61
2026-01-04 | 14100 | 2300 | 19.49
2026-01-05 | 13900 | -200 | -1.42
Before window functions, this required a self-join:
-- The painful way
SELECT
a.date,
a.revenue,
a.revenue - b.revenue as daily_change
FROM daily_revenue a
LEFT JOIN daily_revenue b
ON b.date = a.date - INTERVAL '1 day';That self-join works fine until your data has gaps (weekends, holidays). Then you need to find the "previous existing date" instead of "yesterday," and the join gets ugly fast. LAG handles gaps naturally because it looks at the previous row in the result set, not the previous calendar day.
This is a pattern I use in almost every analytics dashboard:
SELECT
date,
revenue,
LAG(revenue, 365) OVER (ORDER BY date) as revenue_last_year,
ROUND(
100.0 * (revenue - LAG(revenue, 365) OVER (ORDER BY date))
/ NULLIF(LAG(revenue, 365) OVER (ORDER BY date), 0),
2
) as yoy_growth_pct
FROM daily_revenue
WHERE date >= '2025-01-01';Or more commonly, month-over-month with monthly aggregates:
WITH monthly AS (
SELECT
DATE_TRUNC('month', date) as month,
SUM(revenue) as total_revenue
FROM daily_revenue
GROUP BY DATE_TRUNC('month', date)
)
SELECT
month,
total_revenue,
LAG(total_revenue) OVER (ORDER BY month) as prev_month,
ROUND(
100.0 * (total_revenue - LAG(total_revenue) OVER (ORDER BY month))
/ NULLIF(LAG(total_revenue) OVER (ORDER BY month), 0),
2
) as mom_growth_pct
FROM monthly
ORDER BY month;month | total_revenue | prev_month | mom_growth_pct
-----------|---------------|------------|---------------
2025-10-01 | 385000 | NULL | NULL
2025-11-01 | 412000 | 385000 | 7.01
2025-12-01 | 498000 | 412000 | 20.87
2026-01-01 | 376000 | 498000 | -24.50
2026-02-01 | 401000 | 376000 | 6.65
These return the first or last value in the window frame:
SELECT
name,
department,
salary,
FIRST_VALUE(name) OVER (
PARTITION BY department ORDER BY salary DESC
) as highest_paid,
LAST_VALUE(name) OVER (
PARTITION BY department ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) as lowest_paid
FROM employees;Notice the frame clause on LAST_VALUE. This is one of the most common window function bugs. Without it, the default frame is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which means LAST_VALUE returns the current row's value — completely useless. You almost always need to explicitly set ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING with LAST_VALUE.
I've seen this bug in production code more times than I can count. Someone writes LAST_VALUE, gets weird results, assumes the function is broken, and rewrites it with a subquery. The function is fine. The default frame is just surprising.
This is where window functions go from "nice shortcut" to "I literally cannot imagine doing this without them."
SELECT
date,
amount,
SUM(amount) OVER (ORDER BY date) as running_total
FROM transactions;date | amount | running_total
-----------|--------|-------------
2026-01-01 | 150 | 150
2026-01-02 | -30 | 120
2026-01-03 | 200 | 320
2026-01-04 | 75 | 395
2026-01-05 | -50 | 345
That's it. One line. The running total accumulates as it goes through the ordered rows. Without window functions:
-- The old way: correlated subquery
SELECT
t1.date,
t1.amount,
(SELECT SUM(t2.amount)
FROM transactions t2
WHERE t2.date <= t1.date) as running_total
FROM transactions t1
ORDER BY t1.date;The correlated subquery is O(n^2). For 100,000 rows, that's 10 billion operations. The window function is O(n log n) — dominated by the sort. On a table with 500,000 transactions, I've seen the difference go from 45 seconds to 200 milliseconds.
SELECT
date,
category,
amount,
SUM(amount) OVER (
PARTITION BY category
ORDER BY date
) as category_running_total
FROM transactions
ORDER BY category, date;SELECT
date,
new_users,
SUM(new_users) OVER (ORDER BY date) as total_users,
COUNT(*) OVER (ORDER BY date) as day_number,
ROUND(
AVG(new_users) OVER (ORDER BY date),
2
) as avg_daily_signups
FROM user_signups;date | new_users | total_users | day_number | avg_daily_signups
-----------|-----------|-------------|------------|------------------
2026-01-01 | 45 | 45 | 1 | 45.00
2026-01-02 | 52 | 97 | 2 | 48.50
2026-01-03 | 38 | 135 | 3 | 45.00
2026-01-04 | 67 | 202 | 4 | 50.50
2026-01-05 | 41 | 243 | 5 | 48.60
The frame clause is the part of window functions most people skip, and it's the part that unlocks the most interesting patterns. The frame defines exactly which rows the window function considers for the current row.
{ ROWS | RANGE | GROUPS } BETWEEN frame_start AND frame_endWhere frame_start and frame_end can be:
UNBOUNDED PRECEDING — the first row of the partitionN PRECEDING — N rows/values before the current rowCURRENT ROW — the current rowN FOLLOWING — N rows/values after the current rowUNBOUNDED FOLLOWING — the last row of the partitionThis is critical and trips up even experienced SQL developers:
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.That second default is why SUM() OVER (ORDER BY date) gives you a running total — the frame starts at the beginning of the partition and ends at the current row.
But there's a subtle gotcha with RANGE. RANGE groups rows that have the same ORDER BY value. If two transactions have the same date, they'll both see the sum including both of them:
-- Sample data with duplicate dates
-- 2026-01-01: 100, 2026-01-01: 50, 2026-01-02: 75
-- With RANGE (default when using ORDER BY):
SUM(amount) OVER (ORDER BY date)
-- Row 1 (Jan 1, 100): running_total = 150 (includes both Jan 1 rows!)
-- Row 2 (Jan 1, 50): running_total = 150 (same! both Jan 1 rows included)
-- Row 3 (Jan 2, 75): running_total = 225
-- With ROWS:
SUM(amount) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
-- Row 1 (Jan 1, 100): running_total = 100
-- Row 2 (Jan 1, 50): running_total = 150
-- Row 3 (Jan 2, 75): running_total = 225If your ORDER BY column has unique values (timestamps down to the millisecond, sequential IDs), ROWS and RANGE behave identically. If it might have duplicates, you need to think carefully about which behavior you want.
GROUPS is less commonly used but useful. Instead of counting individual rows (ROWS) or value ranges (RANGE), it counts groups of peers — rows with the same ORDER BY value:
SUM(amount) OVER (ORDER BY date GROUPS BETWEEN 1 PRECEDING AND CURRENT ROW)This says "the current date group and the previous date group," regardless of how many rows each group has.
Moving averages are one of the most practical applications of frame clauses. They smooth out noise in time-series data.
SELECT
date,
revenue,
ROUND(
AVG(revenue) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
),
2
) as moving_avg_7d
FROM daily_revenue;date | revenue | moving_avg_7d
-----------|---------|-------------
2026-01-01 | 12500 | 12500.00
2026-01-02 | 13200 | 12850.00
2026-01-03 | 11800 | 12500.00
2026-01-04 | 14100 | 12900.00
2026-01-05 | 13900 | 13100.00
2026-01-06 | 12200 | 12950.00
2026-01-07 | 15100 | 13257.14
2026-01-08 | 14800 | 13585.71
2026-01-09 | 13500 | 13628.57
2026-01-10 | 12900 | 13785.71
Notice: 6 PRECEDING AND CURRENT ROW gives you 7 rows total. The first 6 rows have fewer than 7 rows of history, so the average is over fewer data points. If you want to exclude incomplete windows:
SELECT * FROM (
SELECT
date,
revenue,
AVG(revenue) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as moving_avg_7d,
COUNT(*) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as window_size
FROM daily_revenue
) t
WHERE window_size = 7;For statistical analysis, you might want a centered window — looking both backward and forward:
SELECT
date,
revenue,
AVG(revenue) OVER (
ORDER BY date
ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
) as centered_avg_7d
FROM daily_revenue;Same pattern, different function. "Total revenue in the last 30 days" for each day:
SELECT
date,
revenue,
SUM(revenue) OVER (
ORDER BY date
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
) as rolling_30d_revenue
FROM daily_revenue;Window functions pair beautifully with percentile functions for distribution analysis:
SELECT
name,
department,
salary,
PERCENT_RANK() OVER (ORDER BY salary) as pct_rank,
CUME_DIST() OVER (ORDER BY salary) as cumulative_dist,
NTILE(100) OVER (ORDER BY salary) as percentile
FROM employees;name | department | salary | pct_rank | cumulative_dist | percentile
--------|-------------|---------|----------|-----------------|----------
Dave | Support | 52000 | 0.0000 | 0.1000 | 10
Eve | Support | 58000 | 0.1111 | 0.2000 | 20
Carol | Marketing | 65000 | 0.2222 | 0.3000 | 30
Frank | Marketing | 72000 | 0.3333 | 0.4000 | 40
Grace | Sales | 78000 | 0.4444 | 0.5000 | 50
Henry | Sales | 85000 | 0.5556 | 0.6000 | 60
Alice | Engineering | 92000 | 0.6667 | 0.7000 | 70
Bob | Engineering | 105000 | 0.7778 | 0.8000 | 80
Ivan | Engineering | 125000 | 0.8889 | 0.9000 | 90
Judy | Engineering | 145000 | 1.0000 | 1.0000 | 100
A real-world example — identifying where each employee falls within their department's salary distribution:
SELECT
name,
department,
salary,
MIN(salary) OVER w as dept_min,
MAX(salary) OVER w as dept_max,
ROUND(
100.0 * (salary - MIN(salary) OVER w)
/ NULLIF(MAX(salary) OVER w - MIN(salary) OVER w, 0),
1
) as dept_position_pct
FROM employees
WINDOW w AS (PARTITION BY department);That WINDOW clause at the bottom is a named window — you can define it once and reuse it. Cleaner than repeating OVER (PARTITION BY department) three times.
Sessionization is when window functions go from "useful" to "I don't know how I'd do this without them." The problem: you have a stream of user events with timestamps, and you need to group them into sessions. A new session starts when there's a gap of more than 30 minutes between events.
-- Step 1: Detect gaps
WITH gaps AS (
SELECT
user_id,
event_time,
event_type,
LAG(event_time) OVER (
PARTITION BY user_id ORDER BY event_time
) as prev_event_time,
CASE
WHEN event_time - LAG(event_time) OVER (
PARTITION BY user_id ORDER BY event_time
) > INTERVAL '30 minutes'
THEN 1
ELSE 0
END as is_new_session
FROM user_events
),
-- Step 2: Assign session IDs using a running sum of new-session flags
sessions AS (
SELECT
*,
SUM(is_new_session) OVER (
PARTITION BY user_id ORDER BY event_time
) + 1 as session_id
FROM gaps
)
SELECT
user_id,
session_id,
MIN(event_time) as session_start,
MAX(event_time) as session_end,
COUNT(*) as event_count,
MAX(event_time) - MIN(event_time) as session_duration
FROM sessions
GROUP BY user_id, session_id
ORDER BY user_id, session_start;user_id | session_id | session_start | session_end | event_count | session_duration
--------|------------|---------------------|---------------------|-------------|----------------
1001 | 1 | 2026-01-15 09:02:00 | 2026-01-15 09:28:00 | 12 | 00:26:00
1001 | 2 | 2026-01-15 14:15:00 | 2026-01-15 14:52:00 | 8 | 00:37:00
1001 | 3 | 2026-01-15 20:30:00 | 2026-01-15 20:45:00 | 5 | 00:15:00
1002 | 1 | 2026-01-15 10:00:00 | 2026-01-15 11:30:00 | 22 | 01:30:00
The trick is the running sum. Every time there's a gap > 30 minutes, is_new_session is 1. The cumulative sum of those flags gives you a session counter that increments at each gap. This is a pattern I use in every analytics project.
Try doing this without window functions. It's a nightmare of self-joins and correlated subqueries that's also an order of magnitude slower.
Gap-and-island is a classic SQL problem family. You have a sequence of values and need to find contiguous groups (islands) separated by gaps.
Given a table of user login dates, find streaks of consecutive daily logins:
WITH login_data AS (
SELECT DISTINCT user_id, login_date::date as login_date
FROM user_logins
),
grouped AS (
SELECT
user_id,
login_date,
login_date - (ROW_NUMBER() OVER (
PARTITION BY user_id ORDER BY login_date
))::int as grp
FROM login_data
)
SELECT
user_id,
MIN(login_date) as streak_start,
MAX(login_date) as streak_end,
COUNT(*) as streak_length
FROM grouped
GROUP BY user_id, grp
HAVING COUNT(*) >= 3 -- only streaks of 3+ days
ORDER BY user_id, streak_start;The magic: if you have consecutive dates (Jan 1, Jan 2, Jan 3) and subtract a sequential row number (1, 2, 3), you get the same result for all of them (Dec 31, Dec 31, Dec 31). Non-consecutive dates produce different groups. It's elegant and fast.
user_id | streak_start | streak_end | streak_length
--------|--------------|-------------|-------------
1001 | 2026-01-03 | 2026-01-08 | 6
1001 | 2026-01-15 | 2026-01-18 | 4
1002 | 2026-01-01 | 2026-01-05 | 5
1002 | 2026-01-10 | 2026-01-13 | 4
The inverse problem — find missing values in a sequence:
SELECT
invoice_number,
LEAD(invoice_number) OVER (ORDER BY invoice_number) as next_invoice,
LEAD(invoice_number) OVER (ORDER BY invoice_number) - invoice_number as gap_size
FROM invoices
WHERE LEAD(invoice_number) OVER (ORDER BY invoice_number) - invoice_number > 1;Actually, that won't work directly in a WHERE clause (window functions can't be in WHERE). Wrap it:
SELECT * FROM (
SELECT
invoice_number,
LEAD(invoice_number) OVER (ORDER BY invoice_number) as next_invoice,
LEAD(invoice_number) OVER (ORDER BY invoice_number) - invoice_number as gap_size
FROM invoices
) t
WHERE gap_size > 1;invoice_number | next_invoice | gap_size
---------------|--------------|--------
1047 | 1052 | 5
1089 | 1091 | 2
1203 | 1210 | 7
When you're using the same window definition multiple times, named windows keep things readable:
SELECT
date,
revenue,
SUM(revenue) OVER w as running_total,
AVG(revenue) OVER w as running_avg,
COUNT(*) OVER w as running_count,
MIN(revenue) OVER w as running_min,
MAX(revenue) OVER w as running_max
FROM daily_revenue
WINDOW w AS (ORDER BY date);You can also build on named windows:
SELECT
date,
category,
revenue,
SUM(revenue) OVER (w ORDER BY date) as running_total,
AVG(revenue) OVER (w ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as moving_avg_7d
FROM daily_revenue
WINDOW w AS (PARTITION BY category);Let me show you some patterns I've used in actual production systems. These aren't toy examples.
You get duplicate records from an upstream data feed. Keep only the latest version of each record:
DELETE FROM raw_events
WHERE id IN (
SELECT id FROM (
SELECT
id,
ROW_NUMBER() OVER (
PARTITION BY event_key
ORDER BY received_at DESC
) as rn
FROM raw_events
) t
WHERE rn > 1
);Or for a read-only view of deduplicated data:
CREATE VIEW clean_events AS
SELECT * FROM (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY event_key
ORDER BY received_at DESC
) as rn
FROM raw_events
) t
WHERE rn = 1;Track conversion through a multi-step process:
WITH user_steps AS (
SELECT
user_id,
step_name,
MIN(created_at) as first_completed_at
FROM funnel_events
WHERE step_name IN ('signup', 'onboarding', 'first_purchase', 'subscription')
GROUP BY user_id, step_name
),
funnel AS (
SELECT
user_id,
step_name,
first_completed_at,
LAG(first_completed_at) OVER (
PARTITION BY user_id
ORDER BY first_completed_at
) as prev_step_time,
LAG(step_name) OVER (
PARTITION BY user_id
ORDER BY first_completed_at
) as prev_step_name
FROM user_steps
)
SELECT
step_name,
COUNT(*) as users_reached,
COUNT(prev_step_time) as users_from_prev_step,
ROUND(
AVG(EXTRACT(EPOCH FROM (first_completed_at - prev_step_time)) / 3600),
1
) as avg_hours_from_prev
FROM funnel
GROUP BY step_name
ORDER BY MIN(first_completed_at);How long does it take users between purchases?
SELECT
user_id,
order_date,
LAG(order_date) OVER (
PARTITION BY user_id ORDER BY order_date
) as prev_order_date,
order_date - LAG(order_date) OVER (
PARTITION BY user_id ORDER BY order_date
) as days_between_orders,
ROW_NUMBER() OVER (
PARTITION BY user_id ORDER BY order_date
) as order_number
FROM orders;user_id | order_date | prev_order_date | days_between | order_number
--------|------------|-----------------|--------------|-------------
1001 | 2025-06-15 | NULL | NULL | 1
1001 | 2025-08-22 | 2025-06-15 | 68 | 2
1001 | 2025-09-10 | 2025-08-22 | 19 | 3
1001 | 2025-12-01 | 2025-09-10 | 82 | 4
1002 | 2025-07-03 | NULL | NULL | 1
1002 | 2025-07-28 | 2025-07-03 | 25 | 2
Flag values that are more than 2 standard deviations from the rolling mean:
SELECT * FROM (
SELECT
date,
metric_value,
AVG(metric_value) OVER w as rolling_avg,
STDDEV(metric_value) OVER w as rolling_stddev,
CASE
WHEN ABS(metric_value - AVG(metric_value) OVER w)
> 2 * STDDEV(metric_value) OVER w
THEN true
ELSE false
END as is_anomaly
FROM metrics
WINDOW w AS (ORDER BY date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW)
) t
WHERE is_anomaly;Find which products account for 80% of revenue (the 80/20 rule):
WITH product_revenue AS (
SELECT
product_id,
product_name,
SUM(amount) as total_revenue
FROM order_items
GROUP BY product_id, product_name
),
ranked AS (
SELECT
*,
SUM(total_revenue) OVER (ORDER BY total_revenue DESC) as cumulative_revenue,
SUM(total_revenue) OVER () as grand_total,
ROUND(
100.0 * SUM(total_revenue) OVER (ORDER BY total_revenue DESC)
/ SUM(total_revenue) OVER (),
2
) as cumulative_pct,
ROW_NUMBER() OVER (ORDER BY total_revenue DESC) as rank
FROM product_revenue
)
SELECT * FROM ranked
WHERE cumulative_pct <= 80
ORDER BY rank;Window functions are not free. Here's what I've learned about their performance characteristics the hard way.
Most window functions require sorting. PARTITION BY a ORDER BY b means the database needs the data sorted by (a, b). If you don't have an index that provides this order, that's a sort operation, and sorts are expensive on large datasets.
The fix: Create indexes that match your window function's PARTITION BY + ORDER BY:
-- For: ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC)
CREATE INDEX idx_events_user_created ON events (user_id, created_at DESC);SELECT
ROW_NUMBER() OVER (ORDER BY date) as rn1,
ROW_NUMBER() OVER (ORDER BY amount) as rn2,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY date) as rn3
FROM transactions;Each distinct window definition potentially requires its own sort. PostgreSQL is smart about this — if two windows can share the same sort order, they will. But three different sort orders means three sorts. On a table with millions of rows, that adds up.
The fix: Try to use the same window definition wherever possible. Use named windows. If you truly need multiple sort orders, consider whether you can restructure the query.
Unlike WHERE clauses, window functions can't use indexes to skip rows. They always process all rows in their partition. A WHERE clause with an index might read 100 rows out of 10 million. A window function will read all 10 million.
The fix: Filter before windowing. Always put your WHERE clause on the base table, not on the windowed result:
-- Good: filter first, then window
SELECT * FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) as rn
FROM events
WHERE created_at >= '2026-01-01' -- filter reduces the working set
) t
WHERE rn = 1;
-- Bad: window everything, then filter
SELECT * FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) as rn
FROM events
-- no WHERE here = processes ALL rows
) t
WHERE rn = 1 AND created_at >= '2026-01-01';In PostgreSQL, CTEs used to be optimization fences (before version 12). Since PostgreSQL 12, the optimizer can inline CTEs. But if you use WITH ... AS MATERIALIZED, the CTE is computed and stored in a temp buffer. For large window function results, this can be useful or harmful depending on context. Profile both approaches.
Always check the execution plan:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT
user_id,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) as rn
FROM events
WHERE created_at >= '2026-01-01';Look for WindowAgg nodes and Sort nodes. If the Sort is taking most of the time, add an index. If the WindowAgg is slow, you have too many rows in your partitions.
When should you use each approach?
Window functions when you need to:
CTEs when you need to:
Subqueries when you need to:
In practice, the best queries combine all three:
WITH monthly_revenue AS (
-- CTE for readability and reuse
SELECT
DATE_TRUNC('month', order_date) as month,
category,
SUM(amount) as revenue
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY 1, 2
),
with_trends AS (
-- Window functions for time-series analysis
SELECT
*,
LAG(revenue) OVER (PARTITION BY category ORDER BY month) as prev_month,
AVG(revenue) OVER (
PARTITION BY category
ORDER BY month
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) as moving_avg_3m,
RANK() OVER (PARTITION BY month ORDER BY revenue DESC) as category_rank
FROM monthly_revenue
)
SELECT * FROM with_trends
WHERE category_rank <= 5 -- top 5 categories per month
ORDER BY month, category_rank;-- This does NOT work
SELECT * FROM employees
WHERE ROW_NUMBER() OVER (ORDER BY salary DESC) <= 10;Window functions are evaluated after WHERE. You must wrap in a subquery or CTE:
-- This works
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY salary DESC) as rn
FROM employees
) t
WHERE rn <= 10;-- These are NOT the same:
SUM(salary) OVER (PARTITION BY department)
-- Frame: entire partition. Returns the same total for every row.
SUM(salary) OVER (PARTITION BY department ORDER BY hire_date)
-- Frame: start of partition to current row. Returns a running total!I once spent two hours debugging a dashboard because adding ORDER BY to a SUM window turned a total into a running total. The numbers looked "almost right" for the last row of each group, which made it incredibly confusing.
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC)If two employees have the same salary, their ROW_NUMBER assignment is arbitrary. It might even change between executions. Always add a tiebreaker:
ROW_NUMBER() OVER (
PARTITION BY department
ORDER BY salary DESC, employee_id -- tiebreaker
)Already covered above but worth repeating because it's that common:
-- Broken (returns current row's value)
LAST_VALUE(name) OVER (PARTITION BY dept ORDER BY salary)
-- Fixed
LAST_VALUE(name) OVER (
PARTITION BY dept ORDER BY salary
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)Window functions never reduce the number of rows. If you want "only the highest-paid employee per department," the window function marks them, but you still need to filter:
-- This returns ALL employees, just with a rank column
SELECT *, RANK() OVER (PARTITION BY dept ORDER BY salary DESC) as rk
FROM employees;
-- This returns only the top-paid per department
SELECT * FROM (
SELECT *, RANK() OVER (PARTITION BY dept ORDER BY salary DESC) as rk
FROM employees
) t WHERE rk = 1;Window functions are part of the SQL standard (SQL:2003) and are supported by all major databases, but with varying levels of completeness:
If you're on MySQL 5.7 or earlier, you're out of luck. Upgrade or use the correlated-subquery workarounds. Seriously, upgrade.
Here are the patterns I reach for most often, collected in one place.
Top N per group:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY group_col ORDER BY sort_col DESC) as rn
FROM table
) t WHERE rn <= N;Running total:
SUM(amount) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)Moving average (7-day):
AVG(value) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)Previous/next row value:
LAG(column, 1) OVER (PARTITION BY group_col ORDER BY sort_col)
LEAD(column, 1) OVER (PARTITION BY group_col ORDER BY sort_col)Day-over-day change:
value - LAG(value) OVER (ORDER BY date) as changePercent of total:
100.0 * value / SUM(value) OVER () as pct_of_total
100.0 * value / SUM(value) OVER (PARTITION BY group_col) as pct_of_groupCumulative distribution:
CUME_DIST() OVER (ORDER BY value)
PERCENT_RANK() OVER (ORDER BY value)Sessionization (gap-based grouping):
SUM(CASE WHEN gap > threshold THEN 1 ELSE 0 END) OVER (ORDER BY time) as session_idGap detection:
LEAD(seq_col) OVER (ORDER BY seq_col) - seq_col as gap_sizeIslands (consecutive groups):
value - ROW_NUMBER() OVER (ORDER BY value) as island_groupDeduplication (keep latest):
ROW_NUMBER() OVER (PARTITION BY key_col ORDER BY updated_at DESC) = 1First/last in group:
FIRST_VALUE(col) OVER (PARTITION BY group_col ORDER BY sort_col)
LAST_VALUE(col) OVER (PARTITION BY group_col ORDER BY sort_col
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)Anomaly detection:
ABS(value - AVG(value) OVER w) > 2 * STDDEV(value) OVER wWindow functions transformed how I think about data problems. Before, I'd instinctively reach for application code — pull the data, loop through it, compute what I need. Now, most of those computations happen in the query itself. The database is almost always faster at this than your application server. It has the data already. It has the indexes. It has decades of optimizer engineering. Let it do its job.
If you take one thing from this post, let it be this: next time you find yourself writing a correlated subquery, a self-join, or pulling data into your application to compute rankings, running totals, or row comparisons — stop and ask whether a window function would do it. The answer is almost always yes. And it'll be faster, more readable, and easier to maintain.
The syntax looks intimidating at first. SUM(amount) OVER (PARTITION BY category ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) is a mouthful. But once you internalize the three components — function, partition, frame — it becomes second nature. Give it a week of deliberate practice. Write window functions even when you don't need them. Rewrite old queries. Explore EXPLAIN ANALYZE output. By the end of the week, you'll wonder how you ever lived without them.