Re: Remove WindowClause PARTITION BY items belonging to redundant pathkeys

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Remove WindowClause PARTITION BY items belonging to redundant pathkeys
Date: 2023-06-09 00:13:02
Message-ID: CAApHDvpc=Gu4J47iSzRENGfbxDeYnBhDqzs4_+Z+3LE3iswq0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for having a look at this.

On Thu, 8 Jun 2023 at 21:11, Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
>
> On Thu, Jun 8, 2023 at 7:37 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>>
>> What the attached patch does is process each WindowClause and removes
>> any items from the PARTITION BY clause that are columns or expressions
>> relating to redundant PathKeys.

> Also I'm wondering if we can do the same optimization to
> wc->orderClause. I tested it with the query below and saw performance
> gains.

After looking again at nodeWindowAgg.c, I think it might be possible
to do a bit more work and apply this to ORDER BY items too. Without
an ORDER BY clause, all rows in the partition are peers of each other,
and if the ORDER BY column is redundant due to belonging to a
redundant pathkey, then those rows must also be peers too since the
redundant pathkey must mean all rows have an equal value in the
redundant column.

However, there is a case where we must be much more careful. The
comment you highlighted in create_windowagg_plan() does mention this.
It reads "we must *not* remove the ordering column for RANGE OFFSET
cases".

The following query can't work when the WindowClause has no ORDER BY column.

postgres=# select relname,sum(pg_relation_size(oid)) over (range
between 10 preceding and current row) from pg_Class;
ERROR: RANGE with offset PRECEDING/FOLLOWING requires exactly one
ORDER BY column
LINE 1: select relname,sum(pg_relation_size(oid)) over (range between...

It might be possible to make adjustments in nodeWindowAgg.c to have
the equality checks come out as true when there is no ORDER BY.
update_frameheadpos() is one location that would need to be adjusted.
It would need further study to ensure we don't accidentally break
anything. I've not done that study, so won't be adjusting the patch
for now.

I've attached an updated patch which updates the outdated comment
which you highlighted. I ended up moving the mention of removing
redundant columns into make_pathkeys_for_window() as it seemed a much
more relevant location to mention this optimisation.

David

Attachment Content-Type Size
remove_redundant_partition_by_items_v2.patch application/octet-stream 4.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Smith 2023-06-09 00:20:18 Re: Major pgbench synthetic SELECT workload regression, Ubuntu 23.04+PG15
Previous Message Andres Freund 2023-06-09 00:06:00 Re: index prefetching