What the SQL?!? Recursive

2017-03-22 sql

Today’s “What the SQL?!?” features the keyword WINDOW. This clause allows us to elegantly select results from the previous results from the previous results from the previous results…

Please note, our target database is PostgreSQL. These examples may work with other databases, but might need some massaging to get them to work properly. Search online for the specific vendor’s documentation if errors pop up. Try searching for “WINDOW queries $DB_VENDOR_NAME”. Not all database vendors support the keyword WINDOW.

Create Sample Data

DROP TABLE IF EXISTS sample_moves;
CREATE TABLE sample_moves AS
  SELECT
    column1::int     AS id,
    column2::varchar AS name,
    column3::varchar AS address,
    column4::date AS moved_at
  FROM (
    VALUES
      (1, 'Alice' , '1 Main St', '2017-01-01'),
      (2, 'Bob'   , '2 Main St', '2017-02-01'),
      (3, 'Cat'   , '2 Main St', '2017-03-01'),
      (4, 'Dan Sr'  , '3 Main St',  '1970-04-01'),
      (5, 'Dan Jr'  , '3 Main St',  '2001-04-01'),
      (6, 'Dan 3rd' , '3 Main St', '2017-04-01')
  ) as t
;

CREATE INDEX ON sample_moves(address);

SELECT * FROM sample_moves;

Results:

id	name	address	moved_at
1	Alice	1 Main St	2017-01-01
2	Bob	2 Main St	2017-02-01
3	Cat	2 Main St	2017-03-01
4	Dan Sr	3 Main St	1970-04-01
5	Dan Jr	3 Main St	2001-04-01
6	Dan 3rd	3 Main St	2017-04-01

Life Without Windows

A quick poem…

Eyes big and wide,
nothing seen inside.
Feeling around
nothing abound.
This things wet,
toxic I bet.
Closing my eyes,
still can’t rest.
Having a window,
would be best.

How many people live at each address?

Using a standard GROUP BY with COUNT we consolidate the records and count how many rows belong to each address.

Tip: COUNT(1) is more efficient than COUNT(*).

SELECT
  address,
  COUNT(1) total
FROM sample_moves
GROUP BY address
ORDER BY address;

Results:

address	total
1 Main St	1
2 Main St	2
3 Main St	3

How many people live with each person?

Enter subquery land. Life without windows is not exciting.

SELECT
  *,
  (
    SELECT
      -- everyone at the address, minus the person
      COUNT(1) - 1
    FROM sample_moves t2
    WHERE t2.address = t1.address
  ) AS others
FROM sample_moves t1;

Results:

id	name	address	moved_at	others
1	Alice	1 Main St	2017-01-01	0
2	Bob	2 Main St	2017-02-01	1
3	Cat	2 Main St	2017-03-01	1
4	Dan Sr	3 Main St	1970-04-01	2
5	Dan Jr	3 Main St	2001-04-01	2
6	Dan 3rd	3 Main St	2017-04-01	2

`JOIN` works, too

SELECT
  t1.*,
  t2.others
FROM sample_moves t1
JOIN (
  SELECT
    address,
    COUNT(1) - 1 as others
  FROM sample_moves
  GROUP BY address
  ORDER BY address
) t2 USING (address);

And so does `JOIN LATERAL`

SELECT
  t1.*,
  t2.others
FROM sample_moves t1
JOIN LATERAL (
  SELECT
    address,
    COUNT(1) - 1 as others
  FROM sample_moves sub
  WHERE sub.address = t1.address
  GROUP BY address
  ORDER BY address
) t2 ON true;

That’s nice, but who moved in first?

SELECT
  *,
  (
    SELECT
      COUNT(1) - 1
    FROM sample_moves t2
    WHERE t2.address = t1.address
  ) AS others,
  (
    SELECT
      name
    FROM sample_moves t3
    WHERE t3.address = t1.address
    ORDER BY moved_at ASC
    LIMIT 1
  ) AS first_person
FROM sample_moves t1;

Wait I thought this was about windows?!?

The keyword OVER is the gateway drug into WINDOW functions. Using OVER with parenthesis is an inline window. The PARTITION BY keywords gives similar functionality to GROUP BY and JOIN ... USING all in one power packed statement. It can never reduce the number of records in a result set which is the same behavior expected of a correlated subquery.

PARTITION BY is treated the same as the traditional GROUP BY. The ORDER BY also has the same behavior as its use in a standard query.

SELECT
  *,
  (count(1) OVER (PARTITION BY address)) - 1                      AS others,
  first_value(name) OVER (PARTITION BY address ORDER BY moved_at) AS first_moved
FROM sample_moves;

Results

id	name	address	moved_at	others	first_moved
1	Alice	1 Main St	2017-01-01	0	Alice
2	Bob	2 Main St	2017-02-01	1	Bob
3	Cat	2 Main St	2017-03-01	1	Bob
4	Dan Sr	3 Main St	1970-04-01	2	Dan Sr
5	Dan Jr	3 Main St	2001-04-01	2	Dan Sr
6	Dan 3rd	3 Main St	2017-04-01	2	Dan Sr

A picture with arrows worth a thousand words:

That doesn’t look very DRY. Finally, a `WINDOW`

The WINDOW keyword allows us to alias the options of the OVER clause. Namely the expression (...) between and including the parenthesis.

In the following example we add the use of RANGE to provide additional direction to the windowing clause.

SELECT
  *,
  (count(1) OVER w) - 1 AS others,
  first_value(name) OVER w AS first_moved,
  last_value(name)  OVER w AS last_moved
FROM sample_moves
WINDOW w AS (
  PARTITION BY address ORDER BY moved_at
  RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
);

Results

id	name	address	moved_at	others	first_moved	last_moved
1	Alice	1 Main St	2017-01-01	0	Alice	Alice
2	Bob	2 Main St	2017-02-01	0	Bob	Bob
3	Cat	2 Main St	2017-03-01	1	Bob	Cat
4	Dan Sr	3 Main St	1970-04-01	0	Dan Sr	Dan Sr
5	Dan Jr	3 Main St	2001-04-01	1	Dan Sr	Dan Jr
6	Dan 3rd	3 Main St	2017-04-01	2	Dan Sr	Dan 3rd

-- Previous and Next Record
SELECT
  *,
  (count(1) OVER w) - 1 AS others,
  first_value(name) OVER w AS first_moved,
  last_value(name)  OVER w AS last_moved,
  lag(id) OVER (ORDER BY id) AS prev_id,
  lead(id) OVER (ORDER BY id) AS next_id
FROM sample_moves
WINDOW w AS (
  PARTITION BY address 
  ORDER BY moved_at
  RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
ORDER BY address;

Results

id	name	address	moved_at	others	first_moved	last_moved	prev_id	next_id
1	Alice	1 Main St	2017-01-01	0	Alice	Alice		2
2	Bob	2 Main St	2017-02-01	1	Bob	Cat	1	3
3	Cat	2 Main St	2017-03-01	1	Bob	Cat	2	4
4	Dan Sr	3 Main St	1970-04-01	2	Dan Sr	Dan 3rd	3	5
5	Dan Jr	3 Main St	2001-04-01	2	Dan Sr	Dan 3rd	4	6
6	Dan 3rd	3 Main St	2017-04-01	2	Dan Sr	Dan 3rd	5

List Window Functions

Here is a list from Postgres docs of all the window functions. In addition to these, any regular aggregate function can be use within a window.

Function	Description
row_number()	number of the current row within its partition, counting from 1
rank()	rank of the current row with gaps; same as row_number of its first peer
dense_rank()	rank of the current row without gaps; this function counts peer groups
percent_rank()	relative rank of the current row: (rank - 1) / (total rows - 1)
cume_dist()	relative rank of the current row: (number of rows preceding or peer with current row) / (total rows)
ntile	integer ranging from 1 to the argument value, dividing the partition as equally as possible
lag()	returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default (which must be of the same type as value). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null
lead()	returns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead return default (which must be of the same type as value). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null
first_value()	returns value evaluated at the row that is the first row of the window frame
last_value()	returns value evaluated at the row that is the last row of the window frame
nth_value()	returns value evaluated at the row that is the nth row of the window frame (counting from 1); null if no such row

References

Postgres Window Tutorial: https://www.postgresql.org/docs/9.3/static/tutorial-window.html
Postgres Window Functions: https://www.postgresql.org/docs/9.3/static/functions-window.html
Postgres Window Syntax: https://www.postgresql.org/docs/9.3/static/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS

Create Sample Data

Results:

Life Without Windows

How many people live at each address?

Results:

How many people live with each person?

Results:

JOIN works, too

And so does JOIN LATERAL

That’s nice, but who moved in first?

Wait I thought this was about windows?!?

Results

That doesn’t look very DRY. Finally, a WINDOW

Results

Results

List Window Functions

References

`JOIN` works, too

And so does `JOIN LATERAL`

That doesn’t look very DRY. Finally, a `WINDOW`