What the SQL?!? Recursive

What the SQL?!? Recursive

Today’s “What the SQL?!?” features the keyword WINDOW. This clause allows us to elegantly select results from the previous results from the previous results from the previous results…

Please note, our target database is PostgreSQL. These examples may work with other databases, but might need some massaging to get them to work properly. Search online for the specific vendor’s documentation if errors pop up. Try searching for “WINDOW queries $DB_VENDOR_NAME”. Not all database vendors support the keyword WINDOW.

Create Sample Data

DROP TABLE IF EXISTS sample_moves;
CREATE TABLE sample_moves AS
  SELECT
    column1::int     AS id,
    column2::varchar AS name,
    column3::varchar AS address,
    column4::date AS moved_at
  FROM (
    VALUES
      (1, 'Alice' , '1 Main St', '2017-01-01'),
      (2, 'Bob'   , '2 Main St', '2017-02-01'),
      (3, 'Cat'   , '2 Main St', '2017-03-01'),
      (4, 'Dan Sr'  , '3 Main St',  '1970-04-01'),
      (5, 'Dan Jr'  , '3 Main St',  '2001-04-01'),
      (6, 'Dan 3rd' , '3 Main St', '2017-04-01')
  ) as t
;

CREATE INDEX ON sample_moves(address);

SELECT * FROM sample_moves;

Results:

idnameaddressmoved_at
1Alice1 Main St2017-01-01
2Bob2 Main St2017-02-01
3Cat2 Main St2017-03-01
4Dan Sr3 Main St1970-04-01
5Dan Jr3 Main St2001-04-01
6Dan 3rd3 Main St2017-04-01

Life Without Windows

A quick poem…

Eyes big and wide,
nothing seen inside.
Feeling around
nothing abound.
This things wet,
toxic I bet.
Closing my eyes,
still can’t rest.
Having a window,
would be best.

How many people live at each address?

Using a standard GROUP BY with COUNT we consolidate the records and count how many rows belong to each address.

Tip: COUNT(1) is more efficient than COUNT(*).

SELECT
  address,
  COUNT(1) total
FROM sample_moves
GROUP BY address
ORDER BY address;

Results:

addresstotal
1 Main St1
2 Main St2
3 Main St3

How many people live with each person?

Enter subquery land. Life without windows is not exciting.

SELECT
  *,
  (
    SELECT
      -- everyone at the address, minus the person
      COUNT(1) - 1
    FROM sample_moves t2
    WHERE t2.address = t1.address
  ) AS others
FROM sample_moves t1;

Results:

idnameaddressmoved_atothers
1Alice1 Main St2017-01-010
2Bob2 Main St2017-02-011
3Cat2 Main St2017-03-011
4Dan Sr3 Main St1970-04-012
5Dan Jr3 Main St2001-04-012
6Dan 3rd3 Main St2017-04-012

JOIN works, too

SELECT
  t1.*,
  t2.others
FROM sample_moves t1
JOIN (
  SELECT
    address,
    COUNT(1) - 1 as others
  FROM sample_moves
  GROUP BY address
  ORDER BY address
) t2 USING (address);

And so does JOIN LATERAL

SELECT
  t1.*,
  t2.others
FROM sample_moves t1
JOIN LATERAL (
  SELECT
    address,
    COUNT(1) - 1 as others
  FROM sample_moves sub
  WHERE sub.address = t1.address
  GROUP BY address
  ORDER BY address
) t2 ON true;

That’s nice, but who moved in first?

SELECT
  *,
  (
    SELECT
      COUNT(1) - 1
    FROM sample_moves t2
    WHERE t2.address = t1.address
  ) AS others,
  (
    SELECT
      name
    FROM sample_moves t3
    WHERE t3.address = t1.address
    ORDER BY moved_at ASC
    LIMIT 1
  ) AS first_person
FROM sample_moves t1;

Wait I thought this was about windows?!?

The keyword OVER is the gateway drug into WINDOW functions. Using OVER with parenthesis is an inline window. The PARTITION BY keywords gives similar functionality to GROUP BY and JOIN ... USING all in one power packed statement. It can never reduce the number of records in a result set which is the same behavior expected of a correlated subquery.

PARTITION BY is treated the same as the traditional GROUP BY. The ORDER BY also has the same behavior as its use in a standard query.

SELECT
  *,
  (count(1) OVER (PARTITION BY address)) - 1                      AS others,
  first_value(name) OVER (PARTITION BY address ORDER BY moved_at) AS first_moved
FROM sample_moves;

Results

idnameaddressmoved_atothersfirst_moved
1Alice1 Main St2017-01-010Alice
2Bob2 Main St2017-02-011Bob
3Cat2 Main St2017-03-011Bob
4Dan Sr3 Main St1970-04-012Dan Sr
5Dan Jr3 Main St2001-04-012Dan Sr
6Dan 3rd3 Main St2017-04-012Dan Sr

A picture with arrows worth a thousand words:

Window SQL with arrows

That doesn’t look very DRY. Finally, a WINDOW

The WINDOW keyword allows us to alias the options of the OVER clause. Namely the expression (...) between and including the parenthesis.

In the following example we add the use of RANGE to provide additional direction to the windowing clause.

SELECT
  *,
  (count(1) OVER w) - 1 AS others,
  first_value(name) OVER w AS first_moved,
  last_value(name)  OVER w AS last_moved
FROM sample_moves
WINDOW w AS (
  PARTITION BY address ORDER BY moved_at
  RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
);

Results

idnameaddressmoved_atothersfirst_movedlast_moved
1Alice1 Main St2017-01-010AliceAlice
2Bob2 Main St2017-02-010BobBob
3Cat2 Main St2017-03-011BobCat
4Dan Sr3 Main St1970-04-010Dan SrDan Sr
5Dan Jr3 Main St2001-04-011Dan SrDan Jr
6Dan 3rd3 Main St2017-04-012Dan SrDan 3rd
-- Previous and Next Record
SELECT
  *,
  (count(1) OVER w) - 1 AS others,
  first_value(name) OVER w AS first_moved,
  last_value(name)  OVER w AS last_moved,
  lag(id) OVER (ORDER BY id) AS prev_id,
  lead(id) OVER (ORDER BY id) AS next_id
FROM sample_moves
WINDOW w AS (
  PARTITION BY address 
  ORDER BY moved_at
  RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
ORDER BY address;

Results

idnameaddressmoved_atothersfirst_movedlast_movedprev_idnext_id
1Alice1 Main St2017-01-010AliceAlice2
2Bob2 Main St2017-02-011BobCat13
3Cat2 Main St2017-03-011BobCat24
4Dan Sr3 Main St1970-04-012Dan SrDan 3rd35
5Dan Jr3 Main St2001-04-012Dan SrDan 3rd46
6Dan 3rd3 Main St2017-04-012Dan SrDan 3rd5

List Window Functions

Here is a list from Postgres docs of all the window functions. In addition to these, any regular aggregate function can be use within a window.

FunctionDescription
row_number()number of the current row within its partition, counting from 1
rank()rank of the current row with gaps; same as row_number of its first peer
dense_rank()rank of the current row without gaps; this function counts peer groups
percent_rank()relative rank of the current row: (rank - 1) / (total rows - 1)
cume_dist()relative rank of the current row: (number of rows preceding or peer with current row) / (total rows)
ntileinteger ranging from 1 to the argument value, dividing the partition as equally as possible
lag()returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default (which must be of the same type as value). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null
lead()returns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead return default (which must be of the same type as value). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null
first_value()returns value evaluated at the row that is the first row of the window frame
last_value()returns value evaluated at the row that is the last row of the window frame
nth_value()returns value evaluated at the row that is the nth row of the window frame (counting from 1); null if no such row

References