I spent five years writing SQL before I truly learned window functions. Five years of correlated subqueries, self-joins, temporary tables, and application-layer gymnastics to solve problems that window functions handle in a single, readable expression. When I finally sat down and learned them properly, I had a genuine "why did nobody tell me about this" moment. I went back to old code and rewrote queries that had been bothering me for years. Half of them got shorter. Most of them got faster. A few went from "we run this overnight in a batch job" to "this returns in 400 milliseconds."

Window functions are the most powerful feature in SQL that most developers skip. They show up in interview prep, get filed under "advanced," and then ignored in day-to-day work. That's a mistake. Once you understand them, you'll reach for them constantly — running totals, rankings, moving averages, detecting gaps, sessionizing user events, computing year-over-year growth. All without subqueries. All in a single pass over the data.

This post is everything I wish someone had told me about window functions five years ago. Not the documentation. The real patterns, the real use cases, and the performance traps that don't show up in tutorials.

What Window Functions Actually Are#

Here's the conceptual shift that took me too long to grasp: a regular aggregate function like SUM() or COUNT() collapses rows. You put in 1,000 rows, you get back 1 (or a few, with GROUP BY). A window function computes across a set of rows but keeps every row in the result. That's it. That's the whole idea.

sql

-- Regular aggregate: collapses rows
SELECT department, SUM(salary)
FROM employees
GROUP BY department;
-- Result: one row per department
 
-- Window function: keeps all rows, adds the computation
SELECT name, department, salary,
       SUM(salary) OVER (PARTITION BY department) as dept_total
FROM employees;
-- Result: every employee row, with their department's total attached

The OVER() clause is what makes it a window function. Everything inside that clause defines the "window" — which rows the function can see when computing the result for the current row.

Think of it like this: for each row, the database opens a window onto a subset of the data, runs the function across that window, and writes the result back onto the current row. The window slides. The rows stay.

The Anatomy of OVER()#

Every window function has the same structure:

sql

function_name(...) OVER (
    [PARTITION BY column1, column2, ...]
    [ORDER BY column3, column4, ...]
    [frame_clause]
)

PARTITION BY splits the data into independent groups (like GROUP BY, but without collapsing). Each partition is processed separately.

ORDER BY defines the order within each partition. This matters a lot — it affects what LAG/LEAD return, how ROW_NUMBER assigns numbers, and critically, it changes the default frame for aggregate window functions.

Frame clause defines exactly which rows within the partition the function can see for the current row. This is where most people get confused, and where most bugs hide. More on this later.

Ranking Functions: The Gateway Drug#

Most developers encounter window functions through ranking. You need the "top 3 products per category" or "the most recent order per customer." This is where ROW_NUMBER, RANK, and DENSE_RANK live.

ROW_NUMBER#

Assigns a unique sequential integer to each row within a partition:

sql

SELECT
    name,
    department,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rank
FROM employees;

name        | department | salary  | rank
------------|------------|---------|-----
Alice       | Engineering| 145000  | 1
Bob         | Engineering| 132000  | 2
Carol       | Engineering| 128000  | 3
Dave        | Engineering| 115000  | 4
Eve         | Sales      | 98000   | 1
Frank       | Sales      | 92000   | 2
Grace       | Sales      | 88000   | 3

The classic pattern — "top N per group" — uses this in a subquery or CTE:

sql

WITH ranked AS (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rn
    FROM employees
)
SELECT * FROM ranked WHERE rn <= 3;

This is one of the most common SQL interview questions, and also one of the most common real-world patterns. Before window functions, you'd solve this with a correlated subquery:

sql

-- The old way (don't do this)
SELECT * FROM employees e1
WHERE (
    SELECT COUNT(*) FROM employees e2
    WHERE e2.department = e1.department
    AND e2.salary > e1.salary
) < 3;

That correlated subquery runs once per row in the outer query. On a table with 100,000 employees, that's 100,000 subqueries. The window function version does a single pass with a sort per partition.

RANK vs DENSE_RANK#

The difference only matters when there are ties:

sql

SELECT
    name, score,
    ROW_NUMBER() OVER (ORDER BY score DESC) as row_num,
    RANK()       OVER (ORDER BY score DESC) as rank,
    DENSE_RANK() OVER (ORDER BY score DESC) as dense_rank
FROM players;

name    | score | row_num | rank | dense_rank
--------|-------|---------|------|----------
Alice   | 100   | 1       | 1    | 1
Bob     | 100   | 2       | 1    | 1
Carol   | 95    | 3       | 3    | 2
Dave    | 90    | 4       | 4    | 3
Eve     | 90    | 5       | 4    | 3
Frank   | 85    | 6       | 6    | 4

ROW_NUMBER: Always unique. Ties get arbitrary ordering (nondeterministic without a tiebreaker).
RANK: Ties share a rank, then it skips. 1, 1, 3 (not 2).
DENSE_RANK: Ties share a rank, no skipping. 1, 1, 2.

My rule of thumb: use ROW_NUMBER when you need exactly N results (pagination, dedup). Use RANK for competition-style rankings. Use DENSE_RANK when you want "what percentile bucket is this in" and gaps in numbering would mess up downstream logic.

NTILE: Buckets#

NTILE divides the partition into N roughly equal buckets:

sql

SELECT
    name, salary,
    NTILE(4) OVER (ORDER BY salary) as quartile
FROM employees;

name    | salary  | quartile
--------|---------|--------
Dave    | 55000   | 1
Eve     | 62000   | 1
Carol   | 71000   | 2
Bob     | 85000   | 2
Alice   | 92000   | 3
Frank   | 98000   | 3
Grace   | 115000  | 4
Henry   | 145000  | 4

I use this constantly for creating distribution buckets — salary quartiles, performance tiers, customer segments. The "roughly equal" part matters: if you have 10 rows and ask for 4 buckets, you get groups of 3, 3, 2, 2. Not always perfectly even.

LAG and LEAD: Looking Backward and Forward#

This is where window functions start getting genuinely exciting. LAG looks at the previous row. LEAD looks at the next row. Within the partition, according to the ORDER BY.

sql

SELECT
    date,
    revenue,
    LAG(revenue, 1) OVER (ORDER BY date)  as prev_day_revenue,
    LEAD(revenue, 1) OVER (ORDER BY date) as next_day_revenue
FROM daily_revenue;

date       | revenue | prev_day_revenue | next_day_revenue
-----------|---------|------------------|------------------
2026-01-01 | 12500   | NULL             | 13200
2026-01-02 | 13200   | 12500            | 11800
2026-01-03 | 11800   | 13200            | 14100
2026-01-04 | 14100   | 11800            | 13900
2026-01-05 | 13900   | 14100            | NULL

The second argument is the offset (default 1). You can also provide a third argument as a default value instead of NULL:

sql

LAG(revenue, 1, 0) OVER (ORDER BY date)  -- returns 0 instead of NULL

Day-Over-Day Change#

sql

SELECT
    date,
    revenue,
    revenue - LAG(revenue) OVER (ORDER BY date) as daily_change,
    ROUND(
        100.0 * (revenue - LAG(revenue) OVER (ORDER BY date))
        / LAG(revenue) OVER (ORDER BY date),
        2
    ) as pct_change
FROM daily_revenue;

date       | revenue | daily_change | pct_change
-----------|---------|--------------|----------
2026-01-01 | 12500   | NULL         | NULL
2026-01-02 | 13200   | 700          | 5.60
2026-01-03 | 11800   | -1400        | -10.61
2026-01-04 | 14100   | 2300         | 19.49
2026-01-05 | 13900   | -200         | -1.42

Before window functions, this required a self-join:

sql

-- The painful way
SELECT
    a.date,
    a.revenue,
    a.revenue - b.revenue as daily_change
FROM daily_revenue a
LEFT JOIN daily_revenue b
    ON b.date = a.date - INTERVAL '1 day';

That self-join works fine until your data has gaps (weekends, holidays). Then you need to find the "previous existing date" instead of "yesterday," and the join gets ugly fast. LAG handles gaps naturally because it looks at the previous row in the result set, not the previous calendar day.

Year-Over-Year Comparison#

This is a pattern I use in almost every analytics dashboard:

sql

SELECT
    date,
    revenue,
    LAG(revenue, 365) OVER (ORDER BY date) as revenue_last_year,
    ROUND(
        100.0 * (revenue - LAG(revenue, 365) OVER (ORDER BY date))
        / NULLIF(LAG(revenue, 365) OVER (ORDER BY date), 0),
        2
    ) as yoy_growth_pct
FROM daily_revenue
WHERE date >= '2025-01-01';

Or more commonly, month-over-month with monthly aggregates:

sql

WITH monthly AS (
    SELECT
        DATE_TRUNC('month', date) as month,
        SUM(revenue) as total_revenue
    FROM daily_revenue
    GROUP BY DATE_TRUNC('month', date)
)
SELECT
    month,
    total_revenue,
    LAG(total_revenue) OVER (ORDER BY month) as prev_month,
    ROUND(
        100.0 * (total_revenue - LAG(total_revenue) OVER (ORDER BY month))
        / NULLIF(LAG(total_revenue) OVER (ORDER BY month), 0),
        2
    ) as mom_growth_pct
FROM monthly
ORDER BY month;

month      | total_revenue | prev_month | mom_growth_pct
-----------|---------------|------------|---------------
2025-10-01 | 385000        | NULL       | NULL
2025-11-01 | 412000        | 385000     | 7.01
2025-12-01 | 498000        | 412000     | 20.87
2026-01-01 | 376000        | 498000     | -24.50
2026-02-01 | 401000        | 376000     | 6.65

FIRST_VALUE and LAST_VALUE#

These return the first or last value in the window frame:

sql

SELECT
    name,
    department,
    salary,
    FIRST_VALUE(name) OVER (
        PARTITION BY department ORDER BY salary DESC
    ) as highest_paid,
    LAST_VALUE(name) OVER (
        PARTITION BY department ORDER BY salary DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) as lowest_paid
FROM employees;

Notice the frame clause on LAST_VALUE. This is one of the most common window function bugs. Without it, the default frame is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which means LAST_VALUE returns the current row's value — completely useless. You almost always need to explicitly set ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING with LAST_VALUE.

I've seen this bug in production code more times than I can count. Someone writes LAST_VALUE, gets weird results, assumes the function is broken, and rewrites it with a subquery. The function is fine. The default frame is just surprising.

Running Totals and Cumulative Aggregates#

This is where window functions go from "nice shortcut" to "I literally cannot imagine doing this without them."

Running Total#

sql

SELECT
    date,
    amount,
    SUM(amount) OVER (ORDER BY date) as running_total
FROM transactions;

date       | amount | running_total
-----------|--------|-------------
2026-01-01 | 150    | 150
2026-01-02 | -30    | 120
2026-01-03 | 200    | 320
2026-01-04 | 75     | 395
2026-01-05 | -50    | 345

That's it. One line. The running total accumulates as it goes through the ordered rows. Without window functions:

sql

-- The old way: correlated subquery
SELECT
    t1.date,
    t1.amount,
    (SELECT SUM(t2.amount)
     FROM transactions t2
     WHERE t2.date <= t1.date) as running_total
FROM transactions t1
ORDER BY t1.date;

The correlated subquery is O(n^2). For 100,000 rows, that's 10 billion operations. The window function is O(n log n) — dominated by the sort. On a table with 500,000 transactions, I've seen the difference go from 45 seconds to 200 milliseconds.

Running Total Per Category#

sql

SELECT
    date,
    category,
    amount,
    SUM(amount) OVER (
        PARTITION BY category
        ORDER BY date
    ) as category_running_total
FROM transactions
ORDER BY category, date;

Cumulative Count and Average#

sql

SELECT
    date,
    new_users,
    SUM(new_users) OVER (ORDER BY date) as total_users,
    COUNT(*) OVER (ORDER BY date) as day_number,
    ROUND(
        AVG(new_users) OVER (ORDER BY date),
        2
    ) as avg_daily_signups
FROM user_signups;

date       | new_users | total_users | day_number | avg_daily_signups
-----------|-----------|-------------|------------|------------------
2026-01-01 | 45        | 45          | 1          | 45.00
2026-01-02 | 52        | 97          | 2          | 48.50
2026-01-03 | 38        | 135         | 3          | 45.00
2026-01-04 | 67        | 202         | 4          | 50.50
2026-01-05 | 41        | 243         | 5          | 48.60

The Frame Clause: Where the Real Power Lives#

The frame clause is the part of window functions most people skip, and it's the part that unlocks the most interesting patterns. The frame defines exactly which rows the window function considers for the current row.

Syntax#

sql

{ ROWS | RANGE | GROUPS } BETWEEN frame_start AND frame_end

Where frame_start and frame_end can be:

UNBOUNDED PRECEDING — the first row of the partition
N PRECEDING — N rows/values before the current row
CURRENT ROW — the current row
N FOLLOWING — N rows/values after the current row
UNBOUNDED FOLLOWING — the last row of the partition

Default Frames (The Trap)#

This is critical and trips up even experienced SQL developers:

Without ORDER BY: The default frame is the entire partition (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).
With ORDER BY: The default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

That second default is why SUM() OVER (ORDER BY date) gives you a running total — the frame starts at the beginning of the partition and ends at the current row.

But there's a subtle gotcha with RANGE. RANGE groups rows that have the same ORDER BY value. If two transactions have the same date, they'll both see the sum including both of them:

sql

-- Sample data with duplicate dates
-- 2026-01-01: 100, 2026-01-01: 50, 2026-01-02: 75
 
-- With RANGE (default when using ORDER BY):
SUM(amount) OVER (ORDER BY date)
-- Row 1 (Jan 1, 100): running_total = 150  (includes both Jan 1 rows!)
-- Row 2 (Jan 1,  50): running_total = 150  (same! both Jan 1 rows included)
-- Row 3 (Jan 2,  75): running_total = 225
 
-- With ROWS:
SUM(amount) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
-- Row 1 (Jan 1, 100): running_total = 100
-- Row 2 (Jan 1,  50): running_total = 150
-- Row 3 (Jan 2,  75): running_total = 225

If your ORDER BY column has unique values (timestamps down to the millisecond, sequential IDs), ROWS and RANGE behave identically. If it might have duplicates, you need to think carefully about which behavior you want.

GROUPS (PostgreSQL 11+)#

GROUPS is less commonly used but useful. Instead of counting individual rows (ROWS) or value ranges (RANGE), it counts groups of peers — rows with the same ORDER BY value:

sql

SUM(amount) OVER (ORDER BY date GROUPS BETWEEN 1 PRECEDING AND CURRENT ROW)

This says "the current date group and the previous date group," regardless of how many rows each group has.

Moving Averages#

Moving averages are one of the most practical applications of frame clauses. They smooth out noise in time-series data.

7-Day Moving Average#

sql

SELECT
    date,
    revenue,
    ROUND(
        AVG(revenue) OVER (
            ORDER BY date
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
        ),
        2
    ) as moving_avg_7d
FROM daily_revenue;

date       | revenue | moving_avg_7d
-----------|---------|-------------
2026-01-01 | 12500   | 12500.00
2026-01-02 | 13200   | 12850.00
2026-01-03 | 11800   | 12500.00
2026-01-04 | 14100   | 12900.00
2026-01-05 | 13900   | 13100.00
2026-01-06 | 12200   | 12950.00
2026-01-07 | 15100   | 13257.14
2026-01-08 | 14800   | 13585.71
2026-01-09 | 13500   | 13628.57
2026-01-10 | 12900   | 13785.71

Notice: 6 PRECEDING AND CURRENT ROW gives you 7 rows total. The first 6 rows have fewer than 7 rows of history, so the average is over fewer data points. If you want to exclude incomplete windows:

sql

SELECT * FROM (
    SELECT
        date,
        revenue,
        AVG(revenue) OVER (
            ORDER BY date
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
        ) as moving_avg_7d,
        COUNT(*) OVER (
            ORDER BY date
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
        ) as window_size
    FROM daily_revenue
) t
WHERE window_size = 7;

Centered Moving Average#

For statistical analysis, you might want a centered window — looking both backward and forward:

sql

SELECT
    date,
    revenue,
    AVG(revenue) OVER (
        ORDER BY date
        ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
    ) as centered_avg_7d
FROM daily_revenue;

Moving Sum (Rolling Totals)#

Same pattern, different function. "Total revenue in the last 30 days" for each day:

sql

SELECT
    date,
    revenue,
    SUM(revenue) OVER (
        ORDER BY date
        ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
    ) as rolling_30d_revenue
FROM daily_revenue;

Percentile Calculations#

Window functions pair beautifully with percentile functions for distribution analysis:

sql

SELECT
    name,
    department,
    salary,
    PERCENT_RANK() OVER (ORDER BY salary) as pct_rank,
    CUME_DIST() OVER (ORDER BY salary) as cumulative_dist,
    NTILE(100) OVER (ORDER BY salary) as percentile
FROM employees;

name    | department  | salary  | pct_rank | cumulative_dist | percentile
--------|-------------|---------|----------|-----------------|----------
Dave    | Support     | 52000   | 0.0000   | 0.1000          | 10
Eve     | Support     | 58000   | 0.1111   | 0.2000          | 20
Carol   | Marketing   | 65000   | 0.2222   | 0.3000          | 30
Frank   | Marketing   | 72000   | 0.3333   | 0.4000          | 40
Grace   | Sales       | 78000   | 0.4444   | 0.5000          | 50
Henry   | Sales       | 85000   | 0.5556   | 0.6000          | 60
Alice   | Engineering | 92000   | 0.6667   | 0.7000          | 70
Bob     | Engineering | 105000  | 0.7778   | 0.8000          | 80
Ivan    | Engineering | 125000  | 0.8889   | 0.9000          | 90
Judy    | Engineering | 145000  | 1.0000   | 1.0000          | 100

PERCENT_RANK: (rank - 1) / (total rows - 1). Always 0 to 1.
CUME_DIST: fraction of rows with values <= current value.
NTILE(100): approximate percentile buckets.

Salary Band Analysis#

A real-world example — identifying where each employee falls within their department's salary distribution:

sql

SELECT
    name,
    department,
    salary,
    MIN(salary) OVER w as dept_min,
    MAX(salary) OVER w as dept_max,
    ROUND(
        100.0 * (salary - MIN(salary) OVER w)
        / NULLIF(MAX(salary) OVER w - MIN(salary) OVER w, 0),
        1
    ) as dept_position_pct
FROM employees
WINDOW w AS (PARTITION BY department);

That WINDOW clause at the bottom is a named window — you can define it once and reuse it. Cleaner than repeating OVER (PARTITION BY department) three times.

Sessionization: The Killer Use Case#

Sessionization is when window functions go from "useful" to "I don't know how I'd do this without them." The problem: you have a stream of user events with timestamps, and you need to group them into sessions. A new session starts when there's a gap of more than 30 minutes between events.

sql

-- Step 1: Detect gaps
WITH gaps AS (
    SELECT
        user_id,
        event_time,
        event_type,
        LAG(event_time) OVER (
            PARTITION BY user_id ORDER BY event_time
        ) as prev_event_time,
        CASE
            WHEN event_time - LAG(event_time) OVER (
                PARTITION BY user_id ORDER BY event_time
            ) > INTERVAL '30 minutes'
            THEN 1
            ELSE 0
        END as is_new_session
    FROM user_events
),
-- Step 2: Assign session IDs using a running sum of new-session flags
sessions AS (
    SELECT
        *,
        SUM(is_new_session) OVER (
            PARTITION BY user_id ORDER BY event_time
        ) + 1 as session_id
    FROM gaps
)
SELECT
    user_id,
    session_id,
    MIN(event_time) as session_start,
    MAX(event_time) as session_end,
    COUNT(*) as event_count,
    MAX(event_time) - MIN(event_time) as session_duration
FROM sessions
GROUP BY user_id, session_id
ORDER BY user_id, session_start;

user_id | session_id | session_start       | session_end         | event_count | session_duration
--------|------------|---------------------|---------------------|-------------|----------------
1001    | 1          | 2026-01-15 09:02:00 | 2026-01-15 09:28:00 | 12          | 00:26:00
1001    | 2          | 2026-01-15 14:15:00 | 2026-01-15 14:52:00 | 8           | 00:37:00
1001    | 3          | 2026-01-15 20:30:00 | 2026-01-15 20:45:00 | 5           | 00:15:00
1002    | 1          | 2026-01-15 10:00:00 | 2026-01-15 11:30:00 | 22          | 01:30:00

The trick is the running sum. Every time there's a gap > 30 minutes, is_new_session is 1. The cumulative sum of those flags gives you a session counter that increments at each gap. This is a pattern I use in every analytics project.

Try doing this without window functions. It's a nightmare of self-joins and correlated subqueries that's also an order of magnitude slower.

Gap and Island Problems#

Gap-and-island is a classic SQL problem family. You have a sequence of values and need to find contiguous groups (islands) separated by gaps.

Example: Finding Consecutive Active Days#

Given a table of user login dates, find streaks of consecutive daily logins:

sql

WITH login_data AS (
    SELECT DISTINCT user_id, login_date::date as login_date
    FROM user_logins
),
grouped AS (
    SELECT
        user_id,
        login_date,
        login_date - (ROW_NUMBER() OVER (
            PARTITION BY user_id ORDER BY login_date
        ))::int as grp
    FROM login_data
)
SELECT
    user_id,
    MIN(login_date) as streak_start,
    MAX(login_date) as streak_end,
    COUNT(*) as streak_length
FROM grouped
GROUP BY user_id, grp
HAVING COUNT(*) >= 3  -- only streaks of 3+ days
ORDER BY user_id, streak_start;

The magic: if you have consecutive dates (Jan 1, Jan 2, Jan 3) and subtract a sequential row number (1, 2, 3), you get the same result for all of them (Dec 31, Dec 31, Dec 31). Non-consecutive dates produce different groups. It's elegant and fast.

user_id | streak_start | streak_end  | streak_length
--------|--------------|-------------|-------------
1001    | 2026-01-03   | 2026-01-08  | 6
1001    | 2026-01-15   | 2026-01-18  | 4
1002    | 2026-01-01   | 2026-01-05  | 5
1002    | 2026-01-10   | 2026-01-13  | 4

Finding Gaps in Sequential Data#

The inverse problem — find missing values in a sequence:

sql

SELECT
    invoice_number,
    LEAD(invoice_number) OVER (ORDER BY invoice_number) as next_invoice,
    LEAD(invoice_number) OVER (ORDER BY invoice_number) - invoice_number as gap_size
FROM invoices
WHERE LEAD(invoice_number) OVER (ORDER BY invoice_number) - invoice_number > 1;

Actually, that won't work directly in a WHERE clause (window functions can't be in WHERE). Wrap it:

sql

SELECT * FROM (
    SELECT
        invoice_number,
        LEAD(invoice_number) OVER (ORDER BY invoice_number) as next_invoice,
        LEAD(invoice_number) OVER (ORDER BY invoice_number) - invoice_number as gap_size
    FROM invoices
) t
WHERE gap_size > 1;

invoice_number | next_invoice | gap_size
---------------|--------------|--------
1047           | 1052         | 5
1089           | 1091         | 2
1203           | 1210         | 7

Named Windows: WINDOW Clause#

When you're using the same window definition multiple times, named windows keep things readable:

sql

SELECT
    date,
    revenue,
    SUM(revenue) OVER w as running_total,
    AVG(revenue) OVER w as running_avg,
    COUNT(*) OVER w as running_count,
    MIN(revenue) OVER w as running_min,
    MAX(revenue) OVER w as running_max
FROM daily_revenue
WINDOW w AS (ORDER BY date);

You can also build on named windows:

sql

SELECT
    date,
    category,
    revenue,
    SUM(revenue) OVER (w ORDER BY date) as running_total,
    AVG(revenue) OVER (w ORDER BY date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) as moving_avg_7d
FROM daily_revenue
WINDOW w AS (PARTITION BY category);

Real Production Patterns#

Let me show you some patterns I've used in actual production systems. These aren't toy examples.

Deduplication#

You get duplicate records from an upstream data feed. Keep only the latest version of each record:

sql

DELETE FROM raw_events
WHERE id IN (
    SELECT id FROM (
        SELECT
            id,
            ROW_NUMBER() OVER (
                PARTITION BY event_key
                ORDER BY received_at DESC
            ) as rn
        FROM raw_events
    ) t
    WHERE rn > 1
);

Or for a read-only view of deduplicated data:

sql

CREATE VIEW clean_events AS
SELECT * FROM (
    SELECT
        *,
        ROW_NUMBER() OVER (
            PARTITION BY event_key
            ORDER BY received_at DESC
        ) as rn
    FROM raw_events
) t
WHERE rn = 1;

Funnel Analysis#

Track conversion through a multi-step process:

sql

WITH user_steps AS (
    SELECT
        user_id,
        step_name,
        MIN(created_at) as first_completed_at
    FROM funnel_events
    WHERE step_name IN ('signup', 'onboarding', 'first_purchase', 'subscription')
    GROUP BY user_id, step_name
),
funnel AS (
    SELECT
        user_id,
        step_name,
        first_completed_at,
        LAG(first_completed_at) OVER (
            PARTITION BY user_id
            ORDER BY first_completed_at
        ) as prev_step_time,
        LAG(step_name) OVER (
            PARTITION BY user_id
            ORDER BY first_completed_at
        ) as prev_step_name
    FROM user_steps
)
SELECT
    step_name,
    COUNT(*) as users_reached,
    COUNT(prev_step_time) as users_from_prev_step,
    ROUND(
        AVG(EXTRACT(EPOCH FROM (first_completed_at - prev_step_time)) / 3600),
        1
    ) as avg_hours_from_prev
FROM funnel
GROUP BY step_name
ORDER BY MIN(first_completed_at);

Time Between Events#

How long does it take users between purchases?

sql

SELECT
    user_id,
    order_date,
    LAG(order_date) OVER (
        PARTITION BY user_id ORDER BY order_date
    ) as prev_order_date,
    order_date - LAG(order_date) OVER (
        PARTITION BY user_id ORDER BY order_date
    ) as days_between_orders,
    ROW_NUMBER() OVER (
        PARTITION BY user_id ORDER BY order_date
    ) as order_number
FROM orders;

user_id | order_date | prev_order_date | days_between | order_number
--------|------------|-----------------|--------------|-------------
1001    | 2025-06-15 | NULL            | NULL         | 1
1001    | 2025-08-22 | 2025-06-15      | 68           | 2
1001    | 2025-09-10 | 2025-08-22      | 19           | 3
1001    | 2025-12-01 | 2025-09-10      | 82           | 4
1002    | 2025-07-03 | NULL            | NULL         | 1
1002    | 2025-07-28 | 2025-07-03      | 25           | 2

Detecting Anomalies with Standard Deviation#

Flag values that are more than 2 standard deviations from the rolling mean:

sql

SELECT * FROM (
    SELECT
        date,
        metric_value,
        AVG(metric_value) OVER w as rolling_avg,
        STDDEV(metric_value) OVER w as rolling_stddev,
        CASE
            WHEN ABS(metric_value - AVG(metric_value) OVER w)
                 > 2 * STDDEV(metric_value) OVER w
            THEN true
            ELSE false
        END as is_anomaly
    FROM metrics
    WINDOW w AS (ORDER BY date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW)
) t
WHERE is_anomaly;

Cumulative Percentage / Pareto Analysis#

Find which products account for 80% of revenue (the 80/20 rule):

sql

WITH product_revenue AS (
    SELECT
        product_id,
        product_name,
        SUM(amount) as total_revenue
    FROM order_items
    GROUP BY product_id, product_name
),
ranked AS (
    SELECT
        *,
        SUM(total_revenue) OVER (ORDER BY total_revenue DESC) as cumulative_revenue,
        SUM(total_revenue) OVER () as grand_total,
        ROUND(
            100.0 * SUM(total_revenue) OVER (ORDER BY total_revenue DESC)
            / SUM(total_revenue) OVER (),
            2
        ) as cumulative_pct,
        ROW_NUMBER() OVER (ORDER BY total_revenue DESC) as rank
    FROM product_revenue
)
SELECT * FROM ranked
WHERE cumulative_pct <= 80
ORDER BY rank;

Performance Considerations#

Window functions are not free. Here's what I've learned about their performance characteristics the hard way.

They Need Sorts#

Most window functions require sorting. PARTITION BY a ORDER BY b means the database needs the data sorted by (a, b). If you don't have an index that provides this order, that's a sort operation, and sorts are expensive on large datasets.

The fix: Create indexes that match your window function's PARTITION BY + ORDER BY:

sql

-- For: ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC)
CREATE INDEX idx_events_user_created ON events (user_id, created_at DESC);

Multiple Windows = Multiple Sorts (Usually)#

sql

SELECT
    ROW_NUMBER() OVER (ORDER BY date) as rn1,
    ROW_NUMBER() OVER (ORDER BY amount) as rn2,
    ROW_NUMBER() OVER (PARTITION BY category ORDER BY date) as rn3
FROM transactions;

Each distinct window definition potentially requires its own sort. PostgreSQL is smart about this — if two windows can share the same sort order, they will. But three different sort orders means three sorts. On a table with millions of rows, that adds up.

The fix: Try to use the same window definition wherever possible. Use named windows. If you truly need multiple sort orders, consider whether you can restructure the query.

Window Functions vs. Indexes#

Unlike WHERE clauses, window functions can't use indexes to skip rows. They always process all rows in their partition. A WHERE clause with an index might read 100 rows out of 10 million. A window function will read all 10 million.

The fix: Filter before windowing. Always put your WHERE clause on the base table, not on the windowed result:

sql

-- Good: filter first, then window
SELECT * FROM (
    SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) as rn
    FROM events
    WHERE created_at >= '2026-01-01'  -- filter reduces the working set
) t
WHERE rn = 1;
 
-- Bad: window everything, then filter
SELECT * FROM (
    SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) as rn
    FROM events
    -- no WHERE here = processes ALL rows
) t
WHERE rn = 1 AND created_at >= '2026-01-01';

Materialized CTEs#

In PostgreSQL, CTEs used to be optimization fences (before version 12). Since PostgreSQL 12, the optimizer can inline CTEs. But if you use WITH ... AS MATERIALIZED, the CTE is computed and stored in a temp buffer. For large window function results, this can be useful or harmful depending on context. Profile both approaches.

EXPLAIN ANALYZE Is Your Friend#

Always check the execution plan:

sql

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT
    user_id,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) as rn
FROM events
WHERE created_at >= '2026-01-01';

Look for WindowAgg nodes and Sort nodes. If the Sort is taking most of the time, add an index. If the WindowAgg is slow, you have too many rows in your partitions.

Window Functions vs CTEs vs Subqueries#

When should you use each approach?

Window functions when you need to:

Keep all rows while adding computed columns
Reference previous/next rows (LAG/LEAD)
Compute running aggregates
Rank or number rows within groups

CTEs when you need to:

Build up a query in logical steps for readability
Reference the same intermediate result multiple times
Create recursive queries

Subqueries when you need to:

Filter based on an aggregate (WHERE salary > avg_salary)
Existence checks (WHERE EXISTS ...)
Scalar lookups

In practice, the best queries combine all three:

sql

WITH monthly_revenue AS (
    -- CTE for readability and reuse
    SELECT
        DATE_TRUNC('month', order_date) as month,
        category,
        SUM(amount) as revenue
    FROM orders
    WHERE order_date >= '2025-01-01'
    GROUP BY 1, 2
),
with_trends AS (
    -- Window functions for time-series analysis
    SELECT
        *,
        LAG(revenue) OVER (PARTITION BY category ORDER BY month) as prev_month,
        AVG(revenue) OVER (
            PARTITION BY category
            ORDER BY month
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ) as moving_avg_3m,
        RANK() OVER (PARTITION BY month ORDER BY revenue DESC) as category_rank
    FROM monthly_revenue
)
SELECT * FROM with_trends
WHERE category_rank <= 5  -- top 5 categories per month
ORDER BY month, category_rank;

Common Mistakes (I Made All of These)#

Mistake 1: Using Window Functions in WHERE#

sql

-- This does NOT work
SELECT * FROM employees
WHERE ROW_NUMBER() OVER (ORDER BY salary DESC) <= 10;

Window functions are evaluated after WHERE. You must wrap in a subquery or CTE:

sql

-- This works
SELECT * FROM (
    SELECT *, ROW_NUMBER() OVER (ORDER BY salary DESC) as rn
    FROM employees
) t
WHERE rn <= 10;

Mistake 2: Forgetting That ORDER BY Changes the Default Frame#

sql

-- These are NOT the same:
SUM(salary) OVER (PARTITION BY department)
-- Frame: entire partition. Returns the same total for every row.
 
SUM(salary) OVER (PARTITION BY department ORDER BY hire_date)
-- Frame: start of partition to current row. Returns a running total!

I once spent two hours debugging a dashboard because adding ORDER BY to a SUM window turned a total into a running total. The numbers looked "almost right" for the last row of each group, which made it incredibly confusing.

Mistake 3: Nondeterministic ROW_NUMBER#

sql

ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC)

If two employees have the same salary, their ROW_NUMBER assignment is arbitrary. It might even change between executions. Always add a tiebreaker:

sql

ROW_NUMBER() OVER (
    PARTITION BY department
    ORDER BY salary DESC, employee_id  -- tiebreaker
)

Mistake 4: LAST_VALUE Without Explicit Frame#

Already covered above but worth repeating because it's that common:

sql

-- Broken (returns current row's value)
LAST_VALUE(name) OVER (PARTITION BY dept ORDER BY salary)
 
-- Fixed
LAST_VALUE(name) OVER (
    PARTITION BY dept ORDER BY salary
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

Mistake 5: Expecting Window Functions to Filter Rows#

Window functions never reduce the number of rows. If you want "only the highest-paid employee per department," the window function marks them, but you still need to filter:

sql

-- This returns ALL employees, just with a rank column
SELECT *, RANK() OVER (PARTITION BY dept ORDER BY salary DESC) as rk
FROM employees;
 
-- This returns only the top-paid per department
SELECT * FROM (
    SELECT *, RANK() OVER (PARTITION BY dept ORDER BY salary DESC) as rk
    FROM employees
) t WHERE rk = 1;

Database Support#

Window functions are part of the SQL standard (SQL:2003) and are supported by all major databases, but with varying levels of completeness:

PostgreSQL: Full support. All frame types (ROWS, RANGE, GROUPS). FILTER clause. Named windows. The gold standard.
MySQL 8.0+: Good support. Added in 8.0. No GROUPS frame type. No named windows.
SQLite 3.25+: Surprisingly complete. ROWS, RANGE, GROUPS all supported.
SQL Server: Full support. Slightly different syntax for some frame specifications.
Oracle: Full support. Has proprietary extensions too.

If you're on MySQL 5.7 or earlier, you're out of luck. Upgrade or use the correlated-subquery workarounds. Seriously, upgrade.

The Cheat Sheet#

Here are the patterns I reach for most often, collected in one place.

Top N per group:

sql

SELECT * FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY group_col ORDER BY sort_col DESC) as rn
    FROM table
) t WHERE rn <= N;

Running total:

sql

SUM(amount) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

Moving average (7-day):

sql

AVG(value) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)

Previous/next row value:

sql

LAG(column, 1) OVER (PARTITION BY group_col ORDER BY sort_col)
LEAD(column, 1) OVER (PARTITION BY group_col ORDER BY sort_col)

Day-over-day change:

sql

value - LAG(value) OVER (ORDER BY date) as change

Percent of total:

sql

100.0 * value / SUM(value) OVER () as pct_of_total
100.0 * value / SUM(value) OVER (PARTITION BY group_col) as pct_of_group

Cumulative distribution:

sql

CUME_DIST() OVER (ORDER BY value)
PERCENT_RANK() OVER (ORDER BY value)

Sessionization (gap-based grouping):

sql

SUM(CASE WHEN gap > threshold THEN 1 ELSE 0 END) OVER (ORDER BY time) as session_id

Gap detection:

sql

LEAD(seq_col) OVER (ORDER BY seq_col) - seq_col as gap_size

Islands (consecutive groups):

sql

value - ROW_NUMBER() OVER (ORDER BY value) as island_group

Deduplication (keep latest):

sql

ROW_NUMBER() OVER (PARTITION BY key_col ORDER BY updated_at DESC) = 1

First/last in group:

sql

FIRST_VALUE(col) OVER (PARTITION BY group_col ORDER BY sort_col)
LAST_VALUE(col) OVER (PARTITION BY group_col ORDER BY sort_col
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)

Anomaly detection:

sql

ABS(value - AVG(value) OVER w) > 2 * STDDEV(value) OVER w

Closing Thoughts#

Window functions transformed how I think about data problems. Before, I'd instinctively reach for application code — pull the data, loop through it, compute what I need. Now, most of those computations happen in the query itself. The database is almost always faster at this than your application server. It has the data already. It has the indexes. It has decades of optimizer engineering. Let it do its job.

If you take one thing from this post, let it be this: next time you find yourself writing a correlated subquery, a self-join, or pulling data into your application to compute rankings, running totals, or row comparisons — stop and ask whether a window function would do it. The answer is almost always yes. And it'll be faster, more readable, and easier to maintain.

The syntax looks intimidating at first. SUM(amount) OVER (PARTITION BY category ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) is a mouthful. But once you internalize the three components — function, partition, frame — it becomes second nature. Give it a week of deliberate practice. Write window functions even when you don't need them. Rewrite old queries. Explore EXPLAIN ANALYZE output. By the end of the week, you'll wonder how you ever lived without them.

What Window Functions Actually Are#

sql

-- Regular aggregate: collapses rows
SELECT department, SUM(salary)
FROM employees
GROUP BY department;
-- Result: one row per department
 
-- Window function: keeps all rows, adds the computation
SELECT name, department, salary,
       SUM(salary) OVER (PARTITION BY department) as dept_total
FROM employees;
-- Result: every employee row, with their department's total attached

The OVER() clause is what makes it a window function. Everything inside that clause defines the "window" — which rows the function can see when computing the result for the current row.

The Anatomy of OVER()#

Every window function has the same structure:

sql

function_name(...) OVER (
    [PARTITION BY column1, column2, ...]
    [ORDER BY column3, column4, ...]
    [frame_clause]
)

PARTITION BY splits the data into independent groups (like GROUP BY, but without collapsing). Each partition is processed separately.

Frame clause defines exactly which rows within the partition the function can see for the current row. This is where most people get confused, and where most bugs hide. More on this later.

Ranking Functions: The Gateway Drug#

Most developers encounter window functions through ranking. You need the "top 3 products per category" or "the most recent order per customer." This is where ROW_NUMBER, RANK, and DENSE_RANK live.

ROW_NUMBER#

Assigns a unique sequential integer to each row within a partition:

sql

SELECT
    name,
    department,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rank
FROM employees;

name        | department | salary  | rank
------------|------------|---------|-----
Alice       | Engineering| 145000  | 1
Bob         | Engineering| 132000  | 2
Carol       | Engineering| 128000  | 3
Dave        | Engineering| 115000  | 4
Eve         | Sales      | 98000   | 1
Frank       | Sales      | 92000   | 2
Grace       | Sales      | 88000   | 3

The classic pattern — "top N per group" — uses this in a subquery or CTE:

sql

WITH ranked AS (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rn
    FROM employees
)
SELECT * FROM ranked WHERE rn <= 3;

This is one of the most common SQL interview questions, and also one of the most common real-world patterns. Before window functions, you'd solve this with a correlated subquery:

sql

-- The old way (don't do this)
SELECT * FROM employees e1
WHERE (
    SELECT COUNT(*) FROM employees e2
    WHERE e2.department = e1.department
    AND e2.salary > e1.salary
) < 3;

That correlated subquery runs once per row in the outer query. On a table with 100,000 employees, that's 100,000 subqueries. The window function version does a single pass with a sort per partition.

RANK vs DENSE_RANK#

The difference only matters when there are ties:

sql

SELECT
    name, score,
    ROW_NUMBER() OVER (ORDER BY score DESC) as row_num,
    RANK()       OVER (ORDER BY score DESC) as rank,
    DENSE_RANK() OVER (ORDER BY score DESC) as dense_rank
FROM players;

name    | score | row_num | rank | dense_rank
--------|-------|---------|------|----------
Alice   | 100   | 1       | 1    | 1
Bob     | 100   | 2       | 1    | 1
Carol   | 95    | 3       | 3    | 2
Dave    | 90    | 4       | 4    | 3
Eve     | 90    | 5       | 4    | 3
Frank   | 85    | 6       | 6    | 4

ROW_NUMBER: Always unique. Ties get arbitrary ordering (nondeterministic without a tiebreaker).
RANK: Ties share a rank, then it skips. 1, 1, 3 (not 2).
DENSE_RANK: Ties share a rank, no skipping. 1, 1, 2.

NTILE: Buckets#

NTILE divides the partition into N roughly equal buckets:

sql

SELECT
    name, salary,
    NTILE(4) OVER (ORDER BY salary) as quartile
FROM employees;

name    | salary  | quartile
--------|---------|--------
Dave    | 55000   | 1
Eve     | 62000   | 1
Carol   | 71000   | 2
Bob     | 85000   | 2
Alice   | 92000   | 3
Frank   | 98000   | 3
Grace   | 115000  | 4
Henry   | 145000  | 4

LAG and LEAD: Looking Backward and Forward#

This is where window functions start getting genuinely exciting. LAG looks at the previous row. LEAD looks at the next row. Within the partition, according to the ORDER BY.

sql

SELECT
    date,
    revenue,
    LAG(revenue, 1) OVER (ORDER BY date)  as prev_day_revenue,
    LEAD(revenue, 1) OVER (ORDER BY date) as next_day_revenue
FROM daily_revenue;

date       | revenue | prev_day_revenue | next_day_revenue
-----------|---------|------------------|------------------
2026-01-01 | 12500   | NULL             | 13200
2026-01-02 | 13200   | 12500            | 11800
2026-01-03 | 11800   | 13200            | 14100
2026-01-04 | 14100   | 11800            | 13900
2026-01-05 | 13900   | 14100            | NULL

The second argument is the offset (default 1). You can also provide a third argument as a default value instead of NULL:

sql

LAG(revenue, 1, 0) OVER (ORDER BY date)  -- returns 0 instead of NULL

Day-Over-Day Change#

sql

SELECT
    date,
    revenue,
    revenue - LAG(revenue) OVER (ORDER BY date) as daily_change,
    ROUND(
        100.0 * (revenue - LAG(revenue) OVER (ORDER BY date))
        / LAG(revenue) OVER (ORDER BY date),
        2
    ) as pct_change
FROM daily_revenue;

date       | revenue | daily_change | pct_change
-----------|---------|--------------|----------
2026-01-01 | 12500   | NULL         | NULL
2026-01-02 | 13200   | 700          | 5.60
2026-01-03 | 11800   | -1400        | -10.61
2026-01-04 | 14100   | 2300         | 19.49
2026-01-05 | 13900   | -200         | -1.42

Before window functions, this required a self-join:

sql

-- The painful way
SELECT
    a.date,
    a.revenue,
    a.revenue - b.revenue as daily_change
FROM daily_revenue a
LEFT JOIN daily_revenue b
    ON b.date = a.date - INTERVAL '1 day';

Year-Over-Year Comparison#

This is a pattern I use in almost every analytics dashboard:

sql

SELECT
    date,
    revenue,
    LAG(revenue, 365) OVER (ORDER BY date) as revenue_last_year,
    ROUND(
        100.0 * (revenue - LAG(revenue, 365) OVER (ORDER BY date))
        / NULLIF(LAG(revenue, 365) OVER (ORDER BY date), 0),
        2
    ) as yoy_growth_pct
FROM daily_revenue
WHERE date >= '2025-01-01';

Or more commonly, month-over-month with monthly aggregates:

sql

WITH monthly AS (
    SELECT
        DATE_TRUNC('month', date) as month,
        SUM(revenue) as total_revenue
    FROM daily_revenue
    GROUP BY DATE_TRUNC('month', date)
)
SELECT
    month,
    total_revenue,
    LAG(total_revenue) OVER (ORDER BY month) as prev_month,
    ROUND(
        100.0 * (total_revenue - LAG(total_revenue) OVER (ORDER BY month))
        / NULLIF(LAG(total_revenue) OVER (ORDER BY month), 0),
        2
    ) as mom_growth_pct
FROM monthly
ORDER BY month;

month      | total_revenue | prev_month | mom_growth_pct
-----------|---------------|------------|---------------
2025-10-01 | 385000        | NULL       | NULL
2025-11-01 | 412000        | 385000     | 7.01
2025-12-01 | 498000        | 412000     | 20.87
2026-01-01 | 376000        | 498000     | -24.50
2026-02-01 | 401000        | 376000     | 6.65

FIRST_VALUE and LAST_VALUE#

These return the first or last value in the window frame:

sql

SELECT
    name,
    department,
    salary,
    FIRST_VALUE(name) OVER (
        PARTITION BY department ORDER BY salary DESC
    ) as highest_paid,
    LAST_VALUE(name) OVER (
        PARTITION BY department ORDER BY salary DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) as lowest_paid
FROM employees;

Running Totals and Cumulative Aggregates#

This is where window functions go from "nice shortcut" to "I literally cannot imagine doing this without them."

Running Total#

sql

SELECT
    date,
    amount,
    SUM(amount) OVER (ORDER BY date) as running_total
FROM transactions;

date       | amount | running_total
-----------|--------|-------------
2026-01-01 | 150    | 150
2026-01-02 | -30    | 120
2026-01-03 | 200    | 320
2026-01-04 | 75     | 395
2026-01-05 | -50    | 345

That's it. One line. The running total accumulates as it goes through the ordered rows. Without window functions:

sql

-- The old way: correlated subquery
SELECT
    t1.date,
    t1.amount,
    (SELECT SUM(t2.amount)
     FROM transactions t2
     WHERE t2.date <= t1.date) as running_total
FROM transactions t1
ORDER BY t1.date;

Running Total Per Category#

sql

SELECT
    date,
    category,
    amount,
    SUM(amount) OVER (
        PARTITION BY category
        ORDER BY date
    ) as category_running_total
FROM transactions
ORDER BY category, date;

Cumulative Count and Average#

sql

SELECT
    date,
    new_users,
    SUM(new_users) OVER (ORDER BY date) as total_users,
    COUNT(*) OVER (ORDER BY date) as day_number,
    ROUND(
        AVG(new_users) OVER (ORDER BY date),
        2
    ) as avg_daily_signups
FROM user_signups;

date       | new_users | total_users | day_number | avg_daily_signups
-----------|-----------|-------------|------------|------------------
2026-01-01 | 45        | 45          | 1          | 45.00
2026-01-02 | 52        | 97          | 2          | 48.50
2026-01-03 | 38        | 135         | 3          | 45.00
2026-01-04 | 67        | 202         | 4          | 50.50
2026-01-05 | 41        | 243         | 5          | 48.60

The Frame Clause: Where the Real Power Lives#

Syntax#

sql

{ ROWS | RANGE | GROUPS } BETWEEN frame_start AND frame_end

Where frame_start and frame_end can be:

UNBOUNDED PRECEDING — the first row of the partition
N PRECEDING — N rows/values before the current row
CURRENT ROW — the current row
N FOLLOWING — N rows/values after the current row
UNBOUNDED FOLLOWING — the last row of the partition

Default Frames (The Trap)#

This is critical and trips up even experienced SQL developers:

Without ORDER BY: The default frame is the entire partition (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).
With ORDER BY: The default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

That second default is why SUM() OVER (ORDER BY date) gives you a running total — the frame starts at the beginning of the partition and ends at the current row.

But there's a subtle gotcha with RANGE. RANGE groups rows that have the same ORDER BY value. If two transactions have the same date, they'll both see the sum including both of them:

sql

-- Sample data with duplicate dates
-- 2026-01-01: 100, 2026-01-01: 50, 2026-01-02: 75
 
-- With RANGE (default when using ORDER BY):
SUM(amount) OVER (ORDER BY date)
-- Row 1 (Jan 1, 100): running_total = 150  (includes both Jan 1 rows!)
-- Row 2 (Jan 1,  50): running_total = 150  (same! both Jan 1 rows included)
-- Row 3 (Jan 2,  75): running_total = 225
 
-- With ROWS:
SUM(amount) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
-- Row 1 (Jan 1, 100): running_total = 100
-- Row 2 (Jan 1,  50): running_total = 150
-- Row 3 (Jan 2,  75): running_total = 225

GROUPS (PostgreSQL 11+)#

GROUPS is less commonly used but useful. Instead of counting individual rows (ROWS) or value ranges (RANGE), it counts groups of peers — rows with the same ORDER BY value:

sql

SUM(amount) OVER (ORDER BY date GROUPS BETWEEN 1 PRECEDING AND CURRENT ROW)

This says "the current date group and the previous date group," regardless of how many rows each group has.

Moving Averages#

Moving averages are one of the most practical applications of frame clauses. They smooth out noise in time-series data.

7-Day Moving Average#

sql

SELECT
    date,
    revenue,
    ROUND(
        AVG(revenue) OVER (
            ORDER BY date
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
        ),
        2
    ) as moving_avg_7d
FROM daily_revenue;

date       | revenue | moving_avg_7d
-----------|---------|-------------
2026-01-01 | 12500   | 12500.00
2026-01-02 | 13200   | 12850.00
2026-01-03 | 11800   | 12500.00
2026-01-04 | 14100   | 12900.00
2026-01-05 | 13900   | 13100.00
2026-01-06 | 12200   | 12950.00
2026-01-07 | 15100   | 13257.14
2026-01-08 | 14800   | 13585.71
2026-01-09 | 13500   | 13628.57
2026-01-10 | 12900   | 13785.71

Notice: 6 PRECEDING AND CURRENT ROW gives you 7 rows total. The first 6 rows have fewer than 7 rows of history, so the average is over fewer data points. If you want to exclude incomplete windows:

sql

SELECT * FROM (
    SELECT
        date,
        revenue,
        AVG(revenue) OVER (
            ORDER BY date
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
        ) as moving_avg_7d,
        COUNT(*) OVER (
            ORDER BY date
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
        ) as window_size
    FROM daily_revenue
) t
WHERE window_size = 7;

Centered Moving Average#

For statistical analysis, you might want a centered window — looking both backward and forward:

sql

SELECT
    date,
    revenue,
    AVG(revenue) OVER (
        ORDER BY date
        ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
    ) as centered_avg_7d
FROM daily_revenue;

Moving Sum (Rolling Totals)#

Same pattern, different function. "Total revenue in the last 30 days" for each day:

sql

SELECT
    date,
    revenue,
    SUM(revenue) OVER (
        ORDER BY date
        ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
    ) as rolling_30d_revenue
FROM daily_revenue;

Percentile Calculations#

Window functions pair beautifully with percentile functions for distribution analysis:

sql

SELECT
    name,
    department,
    salary,
    PERCENT_RANK() OVER (ORDER BY salary) as pct_rank,
    CUME_DIST() OVER (ORDER BY salary) as cumulative_dist,
    NTILE(100) OVER (ORDER BY salary) as percentile
FROM employees;

name    | department  | salary  | pct_rank | cumulative_dist | percentile
--------|-------------|---------|----------|-----------------|----------
Dave    | Support     | 52000   | 0.0000   | 0.1000          | 10
Eve     | Support     | 58000   | 0.1111   | 0.2000          | 20
Carol   | Marketing   | 65000   | 0.2222   | 0.3000          | 30
Frank   | Marketing   | 72000   | 0.3333   | 0.4000          | 40
Grace   | Sales       | 78000   | 0.4444   | 0.5000          | 50
Henry   | Sales       | 85000   | 0.5556   | 0.6000          | 60
Alice   | Engineering | 92000   | 0.6667   | 0.7000          | 70
Bob     | Engineering | 105000  | 0.7778   | 0.8000          | 80
Ivan    | Engineering | 125000  | 0.8889   | 0.9000          | 90
Judy    | Engineering | 145000  | 1.0000   | 1.0000          | 100

PERCENT_RANK: (rank - 1) / (total rows - 1). Always 0 to 1.
CUME_DIST: fraction of rows with values <= current value.
NTILE(100): approximate percentile buckets.

Salary Band Analysis#

A real-world example — identifying where each employee falls within their department's salary distribution:

sql

SELECT
    name,
    department,
    salary,
    MIN(salary) OVER w as dept_min,
    MAX(salary) OVER w as dept_max,
    ROUND(
        100.0 * (salary - MIN(salary) OVER w)
        / NULLIF(MAX(salary) OVER w - MIN(salary) OVER w, 0),
        1
    ) as dept_position_pct
FROM employees
WINDOW w AS (PARTITION BY department);

That WINDOW clause at the bottom is a named window — you can define it once and reuse it. Cleaner than repeating OVER (PARTITION BY department) three times.

Sessionization: The Killer Use Case#

sql

-- Step 1: Detect gaps
WITH gaps AS (
    SELECT
        user_id,
        event_time,
        event_type,
        LAG(event_time) OVER (
            PARTITION BY user_id ORDER BY event_time
        ) as prev_event_time,
        CASE
            WHEN event_time - LAG(event_time) OVER (
                PARTITION BY user_id ORDER BY event_time
            ) > INTERVAL '30 minutes'
            THEN 1
            ELSE 0
        END as is_new_session
    FROM user_events
),
-- Step 2: Assign session IDs using a running sum of new-session flags
sessions AS (
    SELECT
        *,
        SUM(is_new_session) OVER (
            PARTITION BY user_id ORDER BY event_time
        ) + 1 as session_id
    FROM gaps
)
SELECT
    user_id,
    session_id,
    MIN(event_time) as session_start,
    MAX(event_time) as session_end,
    COUNT(*) as event_count,
    MAX(event_time) - MIN(event_time) as session_duration
FROM sessions
GROUP BY user_id, session_id
ORDER BY user_id, session_start;

user_id | session_id | session_start       | session_end         | event_count | session_duration
--------|------------|---------------------|---------------------|-------------|----------------
1001    | 1          | 2026-01-15 09:02:00 | 2026-01-15 09:28:00 | 12          | 00:26:00
1001    | 2          | 2026-01-15 14:15:00 | 2026-01-15 14:52:00 | 8           | 00:37:00
1001    | 3          | 2026-01-15 20:30:00 | 2026-01-15 20:45:00 | 5           | 00:15:00
1002    | 1          | 2026-01-15 10:00:00 | 2026-01-15 11:30:00 | 22          | 01:30:00

Try doing this without window functions. It's a nightmare of self-joins and correlated subqueries that's also an order of magnitude slower.

Gap and Island Problems#

Gap-and-island is a classic SQL problem family. You have a sequence of values and need to find contiguous groups (islands) separated by gaps.

Example: Finding Consecutive Active Days#

Given a table of user login dates, find streaks of consecutive daily logins:

sql

WITH login_data AS (
    SELECT DISTINCT user_id, login_date::date as login_date
    FROM user_logins
),
grouped AS (
    SELECT
        user_id,
        login_date,
        login_date - (ROW_NUMBER() OVER (
            PARTITION BY user_id ORDER BY login_date
        ))::int as grp
    FROM login_data
)
SELECT
    user_id,
    MIN(login_date) as streak_start,
    MAX(login_date) as streak_end,
    COUNT(*) as streak_length
FROM grouped
GROUP BY user_id, grp
HAVING COUNT(*) >= 3  -- only streaks of 3+ days
ORDER BY user_id, streak_start;

user_id | streak_start | streak_end  | streak_length
--------|--------------|-------------|-------------
1001    | 2026-01-03   | 2026-01-08  | 6
1001    | 2026-01-15   | 2026-01-18  | 4
1002    | 2026-01-01   | 2026-01-05  | 5
1002    | 2026-01-10   | 2026-01-13  | 4

Finding Gaps in Sequential Data#

The inverse problem — find missing values in a sequence:

sql

SELECT
    invoice_number,
    LEAD(invoice_number) OVER (ORDER BY invoice_number) as next_invoice,
    LEAD(invoice_number) OVER (ORDER BY invoice_number) - invoice_number as gap_size
FROM invoices
WHERE LEAD(invoice_number) OVER (ORDER BY invoice_number) - invoice_number > 1;

Actually, that won't work directly in a WHERE clause (window functions can't be in WHERE). Wrap it:

sql

SELECT * FROM (
    SELECT
        invoice_number,
        LEAD(invoice_number) OVER (ORDER BY invoice_number) as next_invoice,
        LEAD(invoice_number) OVER (ORDER BY invoice_number) - invoice_number as gap_size
    FROM invoices
) t
WHERE gap_size > 1;

invoice_number | next_invoice | gap_size
---------------|--------------|--------
1047           | 1052         | 5
1089           | 1091         | 2
1203           | 1210         | 7

Named Windows: WINDOW Clause#

When you're using the same window definition multiple times, named windows keep things readable:

sql

SELECT
    date,
    revenue,
    SUM(revenue) OVER w as running_total,
    AVG(revenue) OVER w as running_avg,
    COUNT(*) OVER w as running_count,
    MIN(revenue) OVER w as running_min,
    MAX(revenue) OVER w as running_max
FROM daily_revenue
WINDOW w AS (ORDER BY date);

You can also build on named windows:

sql

SELECT
    date,
    category,
    revenue,
    SUM(revenue) OVER (w ORDER BY date) as running_total,
    AVG(revenue) OVER (w ORDER BY date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) as moving_avg_7d
FROM daily_revenue
WINDOW w AS (PARTITION BY category);

Real Production Patterns#

Let me show you some patterns I've used in actual production systems. These aren't toy examples.

Deduplication#

You get duplicate records from an upstream data feed. Keep only the latest version of each record:

sql

DELETE FROM raw_events
WHERE id IN (
    SELECT id FROM (
        SELECT
            id,
            ROW_NUMBER() OVER (
                PARTITION BY event_key
                ORDER BY received_at DESC
            ) as rn
        FROM raw_events
    ) t
    WHERE rn > 1
);

Or for a read-only view of deduplicated data:

sql

CREATE VIEW clean_events AS
SELECT * FROM (
    SELECT
        *,
        ROW_NUMBER() OVER (
            PARTITION BY event_key
            ORDER BY received_at DESC
        ) as rn
    FROM raw_events
) t
WHERE rn = 1;

Funnel Analysis#

Track conversion through a multi-step process:

sql

WITH user_steps AS (
    SELECT
        user_id,
        step_name,
        MIN(created_at) as first_completed_at
    FROM funnel_events
    WHERE step_name IN ('signup', 'onboarding', 'first_purchase', 'subscription')
    GROUP BY user_id, step_name
),
funnel AS (
    SELECT
        user_id,
        step_name,
        first_completed_at,
        LAG(first_completed_at) OVER (
            PARTITION BY user_id
            ORDER BY first_completed_at
        ) as prev_step_time,
        LAG(step_name) OVER (
            PARTITION BY user_id
            ORDER BY first_completed_at
        ) as prev_step_name
    FROM user_steps
)
SELECT
    step_name,
    COUNT(*) as users_reached,
    COUNT(prev_step_time) as users_from_prev_step,
    ROUND(
        AVG(EXTRACT(EPOCH FROM (first_completed_at - prev_step_time)) / 3600),
        1
    ) as avg_hours_from_prev
FROM funnel
GROUP BY step_name
ORDER BY MIN(first_completed_at);

Time Between Events#

How long does it take users between purchases?

sql

SELECT
    user_id,
    order_date,
    LAG(order_date) OVER (
        PARTITION BY user_id ORDER BY order_date
    ) as prev_order_date,
    order_date - LAG(order_date) OVER (
        PARTITION BY user_id ORDER BY order_date
    ) as days_between_orders,
    ROW_NUMBER() OVER (
        PARTITION BY user_id ORDER BY order_date
    ) as order_number
FROM orders;

user_id | order_date | prev_order_date | days_between | order_number
--------|------------|-----------------|--------------|-------------
1001    | 2025-06-15 | NULL            | NULL         | 1
1001    | 2025-08-22 | 2025-06-15      | 68           | 2
1001    | 2025-09-10 | 2025-08-22      | 19           | 3
1001    | 2025-12-01 | 2025-09-10      | 82           | 4
1002    | 2025-07-03 | NULL            | NULL         | 1
1002    | 2025-07-28 | 2025-07-03      | 25           | 2

Detecting Anomalies with Standard Deviation#

Flag values that are more than 2 standard deviations from the rolling mean:

sql

SELECT * FROM (
    SELECT
        date,
        metric_value,
        AVG(metric_value) OVER w as rolling_avg,
        STDDEV(metric_value) OVER w as rolling_stddev,
        CASE
            WHEN ABS(metric_value - AVG(metric_value) OVER w)
                 > 2 * STDDEV(metric_value) OVER w
            THEN true
            ELSE false
        END as is_anomaly
    FROM metrics
    WINDOW w AS (ORDER BY date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW)
) t
WHERE is_anomaly;

Cumulative Percentage / Pareto Analysis#

Find which products account for 80% of revenue (the 80/20 rule):

sql

WITH product_revenue AS (
    SELECT
        product_id,
        product_name,
        SUM(amount) as total_revenue
    FROM order_items
    GROUP BY product_id, product_name
),
ranked AS (
    SELECT
        *,
        SUM(total_revenue) OVER (ORDER BY total_revenue DESC) as cumulative_revenue,
        SUM(total_revenue) OVER () as grand_total,
        ROUND(
            100.0 * SUM(total_revenue) OVER (ORDER BY total_revenue DESC)
            / SUM(total_revenue) OVER (),
            2
        ) as cumulative_pct,
        ROW_NUMBER() OVER (ORDER BY total_revenue DESC) as rank
    FROM product_revenue
)
SELECT * FROM ranked
WHERE cumulative_pct <= 80
ORDER BY rank;

Performance Considerations#

Window functions are not free. Here's what I've learned about their performance characteristics the hard way.

They Need Sorts#

The fix: Create indexes that match your window function's PARTITION BY + ORDER BY:

sql

-- For: ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC)
CREATE INDEX idx_events_user_created ON events (user_id, created_at DESC);

Multiple Windows = Multiple Sorts (Usually)#

sql

SELECT
    ROW_NUMBER() OVER (ORDER BY date) as rn1,
    ROW_NUMBER() OVER (ORDER BY amount) as rn2,
    ROW_NUMBER() OVER (PARTITION BY category ORDER BY date) as rn3
FROM transactions;

The fix: Try to use the same window definition wherever possible. Use named windows. If you truly need multiple sort orders, consider whether you can restructure the query.

Window Functions vs. Indexes#

The fix: Filter before windowing. Always put your WHERE clause on the base table, not on the windowed result:

sql

-- Good: filter first, then window
SELECT * FROM (
    SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) as rn
    FROM events
    WHERE created_at >= '2026-01-01'  -- filter reduces the working set
) t
WHERE rn = 1;
 
-- Bad: window everything, then filter
SELECT * FROM (
    SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) as rn
    FROM events
    -- no WHERE here = processes ALL rows
) t
WHERE rn = 1 AND created_at >= '2026-01-01';

Materialized CTEs#

EXPLAIN ANALYZE Is Your Friend#

Always check the execution plan:

sql

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT
    user_id,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) as rn
FROM events
WHERE created_at >= '2026-01-01';

Look for WindowAgg nodes and Sort nodes. If the Sort is taking most of the time, add an index. If the WindowAgg is slow, you have too many rows in your partitions.

Window Functions vs CTEs vs Subqueries#

When should you use each approach?

Window functions when you need to:

Keep all rows while adding computed columns
Reference previous/next rows (LAG/LEAD)
Compute running aggregates
Rank or number rows within groups

CTEs when you need to:

Build up a query in logical steps for readability
Reference the same intermediate result multiple times
Create recursive queries

Subqueries when you need to:

Filter based on an aggregate (WHERE salary > avg_salary)
Existence checks (WHERE EXISTS ...)
Scalar lookups

In practice, the best queries combine all three:

sql

WITH monthly_revenue AS (
    -- CTE for readability and reuse
    SELECT
        DATE_TRUNC('month', order_date) as month,
        category,
        SUM(amount) as revenue
    FROM orders
    WHERE order_date >= '2025-01-01'
    GROUP BY 1, 2
),
with_trends AS (
    -- Window functions for time-series analysis
    SELECT
        *,
        LAG(revenue) OVER (PARTITION BY category ORDER BY month) as prev_month,
        AVG(revenue) OVER (
            PARTITION BY category
            ORDER BY month
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ) as moving_avg_3m,
        RANK() OVER (PARTITION BY month ORDER BY revenue DESC) as category_rank
    FROM monthly_revenue
)
SELECT * FROM with_trends
WHERE category_rank <= 5  -- top 5 categories per month
ORDER BY month, category_rank;

Common Mistakes (I Made All of These)#

Mistake 1: Using Window Functions in WHERE#

sql

-- This does NOT work
SELECT * FROM employees
WHERE ROW_NUMBER() OVER (ORDER BY salary DESC) <= 10;

Window functions are evaluated after WHERE. You must wrap in a subquery or CTE:

sql

-- This works
SELECT * FROM (
    SELECT *, ROW_NUMBER() OVER (ORDER BY salary DESC) as rn
    FROM employees
) t
WHERE rn <= 10;

Mistake 2: Forgetting That ORDER BY Changes the Default Frame#

sql

-- These are NOT the same:
SUM(salary) OVER (PARTITION BY department)
-- Frame: entire partition. Returns the same total for every row.
 
SUM(salary) OVER (PARTITION BY department ORDER BY hire_date)
-- Frame: start of partition to current row. Returns a running total!

Mistake 3: Nondeterministic ROW_NUMBER#

sql

ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC)

If two employees have the same salary, their ROW_NUMBER assignment is arbitrary. It might even change between executions. Always add a tiebreaker:

sql

ROW_NUMBER() OVER (
    PARTITION BY department
    ORDER BY salary DESC, employee_id  -- tiebreaker
)

Mistake 4: LAST_VALUE Without Explicit Frame#

Already covered above but worth repeating because it's that common:

sql

-- Broken (returns current row's value)
LAST_VALUE(name) OVER (PARTITION BY dept ORDER BY salary)
 
-- Fixed
LAST_VALUE(name) OVER (
    PARTITION BY dept ORDER BY salary
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

Mistake 5: Expecting Window Functions to Filter Rows#

Window functions never reduce the number of rows. If you want "only the highest-paid employee per department," the window function marks them, but you still need to filter:

sql

-- This returns ALL employees, just with a rank column
SELECT *, RANK() OVER (PARTITION BY dept ORDER BY salary DESC) as rk
FROM employees;
 
-- This returns only the top-paid per department
SELECT * FROM (
    SELECT *, RANK() OVER (PARTITION BY dept ORDER BY salary DESC) as rk
    FROM employees
) t WHERE rk = 1;

Database Support#

Window functions are part of the SQL standard (SQL:2003) and are supported by all major databases, but with varying levels of completeness:

PostgreSQL: Full support. All frame types (ROWS, RANGE, GROUPS). FILTER clause. Named windows. The gold standard.
MySQL 8.0+: Good support. Added in 8.0. No GROUPS frame type. No named windows.
SQLite 3.25+: Surprisingly complete. ROWS, RANGE, GROUPS all supported.
SQL Server: Full support. Slightly different syntax for some frame specifications.
Oracle: Full support. Has proprietary extensions too.

If you're on MySQL 5.7 or earlier, you're out of luck. Upgrade or use the correlated-subquery workarounds. Seriously, upgrade.

The Cheat Sheet#

Here are the patterns I reach for most often, collected in one place.

Top N per group:

sql

SELECT * FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY group_col ORDER BY sort_col DESC) as rn
    FROM table
) t WHERE rn <= N;

Running total:

sql

SUM(amount) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

Moving average (7-day):

sql

AVG(value) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)

Previous/next row value:

sql

LAG(column, 1) OVER (PARTITION BY group_col ORDER BY sort_col)
LEAD(column, 1) OVER (PARTITION BY group_col ORDER BY sort_col)

Day-over-day change:

sql

value - LAG(value) OVER (ORDER BY date) as change

Percent of total:

sql

100.0 * value / SUM(value) OVER () as pct_of_total
100.0 * value / SUM(value) OVER (PARTITION BY group_col) as pct_of_group

Cumulative distribution:

sql

CUME_DIST() OVER (ORDER BY value)
PERCENT_RANK() OVER (ORDER BY value)

Sessionization (gap-based grouping):

sql

SUM(CASE WHEN gap > threshold THEN 1 ELSE 0 END) OVER (ORDER BY time) as session_id

Gap detection:

sql

LEAD(seq_col) OVER (ORDER BY seq_col) - seq_col as gap_size

Islands (consecutive groups):

sql

value - ROW_NUMBER() OVER (ORDER BY value) as island_group

Deduplication (keep latest):

sql

ROW_NUMBER() OVER (PARTITION BY key_col ORDER BY updated_at DESC) = 1

First/last in group:

sql

FIRST_VALUE(col) OVER (PARTITION BY group_col ORDER BY sort_col)
LAST_VALUE(col) OVER (PARTITION BY group_col ORDER BY sort_col
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)

Anomaly detection:

sql

ABS(value - AVG(value) OVER w) > 2 * STDDEV(value) OVER w

What Window Functions Actually Are#

The Anatomy of OVER()#

Ranking Functions: The Gateway Drug#

ROW_NUMBER#

RANK vs DENSE_RANK#

NTILE: Buckets#

LAG and LEAD: Looking Backward and Forward#

Day-Over-Day Change#

Year-Over-Year Comparison#

FIRST_VALUE and LAST_VALUE#

Running Totals and Cumulative Aggregates#

Running Total#

Running Total Per Category#

Cumulative Count and Average#

The Frame Clause: Where the Real Power Lives#

Syntax#

Default Frames (The Trap)#

GROUPS (PostgreSQL 11+)#

Moving Averages#

7-Day Moving Average#

Centered Moving Average#

Moving Sum (Rolling Totals)#

Percentile Calculations#

Salary Band Analysis#

Sessionization: The Killer Use Case#

Gap and Island Problems#

Example: Finding Consecutive Active Days#

Finding Gaps in Sequential Data#

Named Windows: WINDOW Clause#

Real Production Patterns#

Deduplication#

Funnel Analysis#

Time Between Events#

Detecting Anomalies with Standard Deviation#

Cumulative Percentage / Pareto Analysis#

Performance Considerations#

They Need Sorts#

Multiple Windows = Multiple Sorts (Usually)#

Window Functions vs. Indexes#

Materialized CTEs#

EXPLAIN ANALYZE Is Your Friend#

Window Functions vs CTEs vs Subqueries#

Common Mistakes (I Made All of These)#

Mistake 1: Using Window Functions in WHERE#

Mistake 2: Forgetting That ORDER BY Changes the Default Frame#

Mistake 3: Nondeterministic ROW_NUMBER#

Mistake 4: LAST_VALUE Without Explicit Frame#

Mistake 5: Expecting Window Functions to Filter Rows#

Database Support#

The Cheat Sheet#

Closing Thoughts#

相关文章

HTTP Header Parser Guide for Web Debugging

SQL Formatter Guide: Review Queries Without Getting Lost

What Window Functions Actually Are#

The Anatomy of OVER()#

Ranking Functions: The Gateway Drug#

ROW_NUMBER#

RANK vs DENSE_RANK#

NTILE: Buckets#

LAG and LEAD: Looking Backward and Forward#

Day-Over-Day Change#

Year-Over-Year Comparison#

FIRST_VALUE and LAST_VALUE#

Running Totals and Cumulative Aggregates#

Running Total#

Running Total Per Category#

Cumulative Count and Average#

The Frame Clause: Where the Real Power Lives#

Syntax#

Default Frames (The Trap)#

GROUPS (PostgreSQL 11+)#

Moving Averages#

7-Day Moving Average#

Centered Moving Average#

Moving Sum (Rolling Totals)#

Percentile Calculations#

Salary Band Analysis#

Sessionization: The Killer Use Case#

Gap and Island Problems#