Remove WindowClause PARTITION BY items belonging to redundant pathkeys

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Remove WindowClause PARTITION BY items belonging to redundant pathkeys
Date: 2023-06-07 23:37:13
Message-ID: CAApHDvo2ji+hdxrxfXtRtsfSVw3to2o1nCO20qimw0dUGK8hcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Recently Markus Winand pointed out to me that the PG15 changes made in
[1] to teach the query planner about monotonic window functions
improved the situation for PostgreSQL on his feature/optimization
timeline for PostgreSQL. These can be seen in [2].

Unfortunately, if you look at the timeline in [2], we're not quite on
green just yet per Markus's "Not with partition by clause (see below)"
caveat. This is because nodeWindowAgg.c's use_pass_through code must
be enabled when the WindowClause has a PARTITION BY clause.

The reason for this is that we can't just stop spitting out rows from
the WindowAgg when one partition is done as we still need to deal with
rows from any subsequent partitions and we can only get to those by
continuing to read rows until we find rows belonging to the next
partition.

There is however a missed optimisation here when there is a PARTITION
BY clause, but also some qual exists for the column(s) mentioned in
the partition by clause that makes it so only one partition can exist.
A simple example of that is in the following:

EXPLAIN
SELECT *
FROM
(SELECT
relkind,
pg_relation_size(oid) size,
rank() OVER (PARTITION BY relkind ORDER BY pg_relation_size(oid) DESC
) rank
FROM pg_class)
WHERE relkind = 'r' AND rank <= 10;

(the subquery may be better imagined as a view)

Here, because of the relkind='r' qual being pushed down into the
subquery, effectively that renders the PARTITION BY relkind clause
redundant.

What the attached patch does is process each WindowClause and removes
any items from the PARTITION BY clause that are columns or expressions
relating to redundant PathKeys.

Effectively, this allows the nodeWindowAgg.c code which stops
processing WindowAgg rows when the run condition is met to work as the
PARTITION BY clause is completely removed in the case of the above
query. Removing the redundant PARTITION BY items also has the added
benefit of not having to needlessly check if the next row belongs to
the same partition as the last row. For the above, that check is a
waste of time as all rows have relkind = 'r'

I passed the patch along to Markus and he kindly confirmed that we're
now green for this particular optimisation.

I'll add this patch to the July commitfest.

David

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9d9c02ccd
[2] https://use-the-index-luke.com/sql/partial-results/window-functions

Attachment Content-Type Size
remove_redundant_partition_by_items.patch text/plain 3.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-06-07 23:47:26 Re: win32ver data in meson-built postgres.exe
Previous Message Thomas Munro 2023-06-07 23:37:00 Re: Let's make PostgreSQL multi-threaded