What the SQL?!? WINDOW

written in sql, talk

Today’s “What the SQL?!?” features the keyword WINDOW. This clause allows us to elegantly select results from the previous results from the previous results from the previous results…

Please note, our target database is PostgreSQL. These examples may work with other databases, but might need some massaging to get them to work properly. Search online for the specific vendor’s documentation if errors pop up. Try searching for “WINDOW queries ”. Not all database vendors support the keyword WINDOW.

Create Sample Data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
DROP TABLE IF EXISTS sample_moves;
CREATE TABLE sample_moves AS
  SELECT
    column1::int     AS id,
    column2::varchar AS name,
    column3::varchar AS address,
    column4::date AS moved_at
  FROM (
    VALUES
      (1, 'Alice' , '1 Main St', '2017-01-01'),
      (2, 'Bob'   , '2 Main St', '2017-02-01'),
      (3, 'Cat'   , '2 Main St', '2017-03-01'),
      (4, 'Dan Sr'  , '3 Main St',  '1970-04-01'),
      (5, 'Dan Jr'  , '3 Main St',  '2001-04-01'),
      (6, 'Dan 3rd' , '3 Main St', '2017-04-01')
  ) as t
;

CREATE INDEX ON sample_moves(address);

SELECT * FROM sample_moves;

Results:

id name address moved_at
1 Alice 1 Main St 2017-01-01
2 Bob 2 Main St 2017-02-01
3 Cat 2 Main St 2017-03-01
4 Dan Sr 3 Main St 1970-04-01
5 Dan Jr 3 Main St 2001-04-01
6 Dan 3rd 3 Main St 2017-04-01

Life Without Windows

A quick poem…

Eyes big and wide,
nothing seen inside.
Feeling around
nothing abound.
This things wet,
toxic I bet.
Closing my eyes,
still can’t rest.
Having a window,
would be best.

How many people live at each address?

Using a standard GROUP BY with COUNT we consolidate the records and count how many rows belong to each address.

Tip: COUNT(1) is more efficient than COUNT(*).

1
2
3
4
5
6
SELECT
  address,
  COUNT(1) total
FROM sample_moves
GROUP BY address
ORDER BY address;

Results:

address total
1 Main St 1
2 Main St 2
3 Main St 3

How many people live with each person?

Enter subquery land. Life without windows is not exciting.

1
2
3
4
5
6
7
8
9
10
SELECT
  *,
  (
    SELECT
      -- everyone at the address, minus the person
      COUNT(1) - 1
    FROM sample_moves t2
    WHERE t2.address = t1.address
  ) AS others
FROM sample_moves t1;

Results:

id name address moved_at others
1 Alice 1 Main St 2017-01-01 0
2 Bob 2 Main St 2017-02-01 1
3 Cat 2 Main St 2017-03-01 1
4 Dan Sr 3 Main St 1970-04-01 2
5 Dan Jr 3 Main St 2001-04-01 2
6 Dan 3rd 3 Main St 2017-04-01 2

JOIN works, too

1
2
3
4
5
6
7
8
9
10
11
12
SELECT
  t1.*,
  t2.others
FROM sample_moves t1
JOIN (
  SELECT
    address,
    COUNT(1) - 1 as others
  FROM sample_moves
  GROUP BY address
  ORDER BY address
) t2 USING (address);

And so does JOIN LATERAL

1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT
  t1.*,
  t2.others
FROM sample_moves t1
JOIN LATERAL (
  SELECT
    address,
    COUNT(1) - 1 as others
  FROM sample_moves sub
  WHERE sub.address = t1.address
  GROUP BY address
  ORDER BY address
) t2 ON true;

That’s nice, but who moved in first?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
SELECT
  *,
  (
    SELECT
      COUNT(1) - 1
    FROM sample_moves t2
    WHERE t2.address = t1.address
  ) AS others,
  (
    SELECT
      name
    FROM sample_moves t3
    WHERE t3.address = t1.address
    ORDER BY moved_at ASC
    LIMIT 1
  ) AS first_person
FROM sample_moves t1;

Wait I thought this was about windows?!?

The keyword OVER is the gateway drug into WINDOW functions. Using OVER with parenthesis is an inline window. The PARTITION BY keywords gives similar functionality to GROUP BY and JOIN ... USING all in one power packed statement. It can never reduce the number of records in a result set which is the same behavior expected of a correlated subquery.

PARTITION BY is treated the same as the traditional GROUP BY. The ORDER BY also has the same behavior as its use in a standard query.

1
2
3
4
5
SELECT
  *,
  (count(1) OVER (PARTITION BY address)) - 1                      AS others,
  first_value(name) OVER (PARTITION BY address ORDER BY moved_at) AS first_moved
FROM sample_moves;

Results

id name address moved_at others first_moved
1 Alice 1 Main St 2017-01-01 0 Alice
2 Bob 2 Main St 2017-02-01 1 Bob
3 Cat 2 Main St 2017-03-01 1 Bob
4 Dan Sr 3 Main St 1970-04-01 2 Dan Sr
5 Dan Jr 3 Main St 2001-04-01 2 Dan Sr
6 Dan 3rd 3 Main St 2017-04-01 2 Dan Sr

A picture with arrows worth a thousand words:

That doesn’t look very DRY. Finally, a WINDOW

The WINDOW keyword allows us to alias the options of the OVER clause. Namely the expression (...) between and including the parenthesis.

In the following example we add the use of RANGE to provide additional direction to the windowing clause.

1
2
3
4
5
6
7
8
9
10
SELECT
  *,
  (count(1) OVER w) - 1 AS others,
  first_value(name) OVER w AS first_moved,
  last_value(name)  OVER w AS last_moved
FROM sample_moves
WINDOW w AS (
  PARTITION BY address ORDER BY moved_at
  RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
);

Results

id name address moved_at others first_moved last_moved
1 Alice 1 Main St 2017-01-01 0 Alice Alice
2 Bob 2 Main St 2017-02-01 0 Bob Bob
3 Cat 2 Main St 2017-03-01 1 Bob Cat
4 Dan Sr 3 Main St 1970-04-01 0 Dan Sr Dan Sr
5 Dan Jr 3 Main St 2001-04-01 1 Dan Sr Dan Jr
6 Dan 3rd 3 Main St 2017-04-01 2 Dan Sr Dan 3rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- Previous and Next Record
SELECT
  *,
  (count(1) OVER w) - 1 AS others,
  first_value(name) OVER w AS first_moved,
  last_value(name)  OVER w AS last_moved,
  lag(id) OVER (ORDER BY id) AS prev_id,
  lead(id) OVER (ORDER BY id) AS next_id
FROM sample_moves
WINDOW w AS (
  PARTITION BY address
  ORDER BY moved_at
  RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
ORDER BY address;

Results

id name address moved_at others first_moved last_moved prev_id next_id
1 Alice 1 Main St 2017-01-01 0 Alice Alice 2
2 Bob 2 Main St 2017-02-01 1 Bob Cat 1 3
3 Cat 2 Main St 2017-03-01 1 Bob Cat 2 4
4 Dan Sr 3 Main St 1970-04-01 2 Dan Sr Dan 3rd 3 5
5 Dan Jr 3 Main St 2001-04-01 2 Dan Sr Dan 3rd 4 6
6 Dan 3rd 3 Main St 2017-04-01 2 Dan Sr Dan 3rd 5

List Window Functions

Here is a list from Postgres docs of all the window functions. In addition to these, any regular aggregate function can be use within a window.

Function Description
row_number() number of the current row within its partition, counting from 1
rank() rank of the current row with gaps; same as row_number of its first peer
dense_rank() rank of the current row without gaps; this function counts peer groups
percent_rank() relative rank of the current row: (rank - 1) / (total rows – 1)
cume_dist() relative rank of the current row: (number of rows preceding or peer with current row) / (total rows)
ntile integer ranging from 1 to the argument value, dividing the partition as equally as possible
lag() returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default (which must be of the same type as value). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null
lead() returns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead return default (which must be of the same type as value). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null
first_value() returns value evaluated at the row that is the first row of the window frame
last_value() returns value evaluated at the row that is the last row of the window frame
nth_value() returns value evaluated at the row that is the nth row of the window frame (counting from 1); null if no such row

References