Add GROUPS option to the Window Functions

From: Oliver Ford <ojford(at)gmail(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Add GROUPS option to the Window Functions
Date: 2017-12-04 14:11:46
Message-ID: CAGMVOdtWkb9X7dUh7vjaCaiH34UGFg88unXYTEOub0Rk0swSXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Adds the GROUPS option to the window framing clause. This further
resolves TODO list item "Implement full support for window framing
clauses" and implements SQL:2011 T620. No other mainstream db has this
feature.

Apply this on top of my previous patch available here:
https://www.postgresql.org/message-id/CAGMVOdvETRCKpeT06Uoq5RsNUOdH7d1iYy7C1Pze%3DL5%3DgBzs-Q%40mail.gmail.com

== Specification ==

The GROUPS option is defined in SQL:2011 in addition to ROWS and
RANGE. Where ROWS calculate frame bounds by the number of rows before
and after the current row, and RANGE by the values of an ORDER BY
column, GROUPS calculates frame bounds by the number of changes to the
values of the ORDER BY columns.

GROUPS behaves similar to RANGE in that if two rows are peers, they
are both included in the frame. A row is out of frame if it is both
not a peer of the current row and also outside of the bounds specified
by start_value and end_value. Note that if neither start_value or
end_value are specified, then GROUPS will always produce the same
results as RANGE. So UNBOUNDED PRECEDING AND CURRENT ROW, or CURRENT
ROW AND UNBOUNDED FOLLOWING produce the same results in GROUPS and
RANGE mode (the syntax is slightly confusing as CURRENT ROW in these
modes includes peers of the actual current row).

The standard also defines an EXCLUDE GROUP option which excludes the
current row and any peers from the frame. This can be used in all
three modes, and is included in the patch.

== Performance Considerations ==

The code calculates the size of each window group for every partition
and stores this in a dynamic array. I chose 16 as the initial capacity
of the array, which doubles as needed. Real-world testing may show
that a lower or higher initial capacity is preferable for the majority
of use cases. The code also calls pfree on this array at the end of
each partition, to avoid memory hogging if there are many partitions.

== Testing ==

Tested on Windows with MinGW. All existing regression tests pass. New
tests and updated documentation is included. Tests show the results of
the GROUPS option and the EXCLUDE GROUP option also working in RANGE
and ROWS mode.

Attachment Content-Type Size
0001-window-groups-v1.patch application/octet-stream 68.4 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2017-12-04 14:12:55 Re: Would a BGW need shmem_access or database_connection to enumerate databases?
Previous Message Sergei Kornilov 2017-12-04 14:09:44 Re: using index or check in ALTER TABLE SET NOT NULL