Re: Todo: Teach planner to evaluate multiple windows in the optimal order

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
Cc: pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Vik Fearing <vik(at)postgresfriends(dot)org>
Subject: Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Date: 2023-01-07 11:58:03
Message-ID: CAApHDvpxy0W3MHrxSaE4YGyHVgRi5qpDQ+qbZqZBEKbtNpz+Wg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Your email client seems to be adding additional vertical space to your
emails. I've removed the additional newlines in the quotes. Are you
able to fix the client so it does not do that?

On Sun, 8 Jan 2023 at 00:10, Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com> wrote:
>
> On 07/01/23 09:58, David Rowley wrote:
> > You also don't seem to be considering the fact that the query might
> > have a DISTINCT clause.
>
> Major reason for this was that I am not exactly aware of what distinct
> clause means (especially in
>
> context of window functions) and how it is different from other
> sortClauses (partition, order by, group).

I'm talking about the query's DISTINCT clause. i.e SELECT DISTINCT.
If you look in the grouping_planner() function, you'll see that
create_distinct_paths() is called between create_window_paths() and
create_ordered_paths().

> Yes, this is a fair point. Multiple sort is actually beneficial in cases
> like this, perhaps limits clause and runCondition should be no op too?

I'm not sure what you mean by "no op". Do you mean not apply the optimization?

> > I think the patch should also be using pathkeys_contained_in() and
> > Lists of pathkeys rather than concatenating lists of SortGroupClauses
> > together. That should allow things to work correctly when a given
> > pathkey has become redundant due to either duplication or a Const in
> > the Eclass.
>
> Make sense, I actually duplicated that logic from
> make_pathkeys_for_window. We should make this changes there as well because
> if we have SELECT rank() OVER (PARTITION BY a ORDER BY a)
> (weird example but you get the idea), it leads to duplicates in
> window_sortclauses.

It won't lead to duplicate pathkeys. Look in
make_pathkeys_for_sortclauses() and what pathkey_is_redundant() does.
Notice that it checks if the pathkey already exists in the list before
appending.

> Agree with runConditions part but for limit clause, row reduction happens
> at the last, so whether we use patched version or master version,
> none of sorts would benefit/degrade from that, right?

Maybe you're right. Just be aware that a sort done for a query with an
ORDER BY LIMIT will perform a top-n sort. top-n sorts only need to
store the top-n tuples and that can significantly reduce the memory
required for sorting perhaps resulting in the sort fitting in memory
rather than spilling out to disk.

You might want to test this by having the leading sort column as an
INT, and then the 2nd one as a long text column of maybe around two
kilobytes. Make all the leading column values the same so that the
comparison for the text column is always performed. Make the LIMIT
small compared to the total number of rows, that should test the worse
case. Check the performance with and without the limitCount != NULL
part of the patch that disables the optimization for LIMIT.

> Is it okay
> to add comments in test cases? I don't see it much on existing cases
> so kind of reluctant to add but it makes intentions much more clear.

I think tests should always have a comment to state what they're
testing. Not many people seem to do that, unfortunately. The problem
with not stating what the test is testing is that if, for example, the
test is checking that the EXPLAIN output is showing a Sort, what if at
some point in the future someone adjusts some costing code and the
plan changes to an Index Scan. If there's no comment to state that
we're looking for a Sort plan, then the author of the patch that's
adjusting the costs might just think it's ok to change the expected
plan to an Index Scan. I've seen this problem occur even when the
comments *do* exist. There's just about no hope of such a test
continuing to do what it's meant to if the comments don't exist.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2023-01-07 12:26:34 Re: Improve tab completion for ALTER FUNCTION/PROCEDURE/ROUTINE
Previous Message Dean Rasheed 2023-01-07 11:18:59 Re: add \dpS to psql