Re: Remove WindowClause PARTITION BY items belonging to redundant pathkeys

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Remove WindowClause PARTITION BY items belonging to redundant pathkeys
Date: 2023-06-08 09:11:17
Message-ID: CAMbWs49jxZkfSP-9rG0Go4W7=pXCgS7jcjYJEERGJF3SoOFAqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 8, 2023 at 7:37 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> What the attached patch does is process each WindowClause and removes
> any items from the PARTITION BY clause that are columns or expressions
> relating to redundant PathKeys.
>
> Effectively, this allows the nodeWindowAgg.c code which stops
> processing WindowAgg rows when the run condition is met to work as the
> PARTITION BY clause is completely removed in the case of the above
> query. Removing the redundant PARTITION BY items also has the added
> benefit of not having to needlessly check if the next row belongs to
> the same partition as the last row. For the above, that check is a
> waste of time as all rows have relkind = 'r'

This is a nice optimization. I reviewed it and here are my findings.

In create_windowagg_plan there is such comment that says

* ... Note: in principle, it's possible
* to drop some of the sort columns, if they were proved redundant by
* pathkey logic. However, it doesn't seem worth going out of our way to
* optimize such cases.

Since this patch removes any clauses from the wc->partitionClause for
redundant pathkeys, this comment seems outdated, at least for the sort
columns in partitionClause.

Also I'm wondering if we can do the same optimization to
wc->orderClause. I tested it with the query below and saw performance
gains.

create table t (a int, b int);
insert into t select 1,2 from generate_series(1,100000)i;
analyze t;

explain analyze
select * from
(select a, b, rank() over (PARTITION BY a order by b) rank
from t where b = 2)
where a = 1 and rank <= 10;

With and without this optimization to wc->orderClause the execution time
is 67.279 ms VS. 119.120 ms (both best of 3).

I notice you comment in the patch that doing this is unsafe because it
would change the semantics of peer rows during execution. Would you
please elaborate on that?

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2023-06-08 09:41:35 Re: Do we want a hashset type?
Previous Message Daniel Verite 2023-06-08 09:03:25 Re: Order changes in PG16 since ICU introduction