Re: Todo: Teach planner to evaluate multiple windows in the optimal order

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
Cc: pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Date: 2023-01-03 02:51:36
Message-ID: CAApHDvp=r1LnEKCmWCYaruMPL-jP4j_sdc8yeFYwaDT1ac5GsQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 26 Dec 2022 at 02:04, Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com> wrote:
> Point #1
>
> In the above query Oracle 10g performs 2 sorts, DB2 and Sybase perform 3
> sorts. We also perform 3.

This shouldn't be too hard to do. See the code in
select_active_windows(). You'll likely want to pay attention to the
DISTINCT pathkeys if they exist and just use the ORDER BY pathkeys if
the query has no DISTINCT clause. DISTINCT is evaluated after Window
and before ORDER BY.

One idea to implement this would be to adjust the loop in
select_active_windows() so that we record any WindowClauses which have
the pathkeys contained in the ORDER BY / DISTINCT pathkeys then record
those separately and append those onto the end of the actives array
after the sort.

I do think you'll likely want to put any WindowClauses which have
pathkeys which are a true subset or true superset of the ORDER BY /
DISTINCT pathkeys last. If they're a superset then we won't need to
perform any additional ordering for the DISTINCT / ORDER BY clause.
If they're a subset then we might be able to perform an Incremental
Sort, which is likely much cheaper than a full sort. The existing
code should handle that part. You just need to make
select_active_windows() more intelligent.

You might also think that we could perform additional optimisations
and also adjust the ORDER BY clause of a WindowClause if it contains
the pathkeys of the DISTINCT / ORDER BY clause. For example:

SELECT *,row_number() over (order by a,b) from tab order by a,b,c;

However, if you were to adjust the WindowClauses ORDER BY to become
a,b,c then you could produce incorrect results for window functions
that change their result based on peer rows.

Note the difference in results from:

create table ab(a int, b int);
insert into ab select x,y from generate_series(1,5) x, generate_Series(1,5)y;

select a,b,count(*) over (order by a) from ab order by a,b;
select a,b,count(*) over (order by a,b) from ab order by a,b;

> and Point #2
>
> Teach planner to decide which window to evaluate first based on costs.
> Currently the first window in the query is evaluated first, there may be no
> index to help sort the first window, but perhaps there are for other windows
> in the query. This may allow an index scan instead of a seqscan -> sort.

What Tom wrote about that in the first paragraph of [1] still applies.
The problem is that if the query contains many joins that to properly
find the cheapest way of executing the query we'd have to perform the
join search once for each unique sort order of each WindowClause.
That's just not practical to do from a performance standpoint. The
join search can be very expensive. There may be something that could
be done to better determine the most likely candidate for the first
WindowClause using some heuristics, but I've no idea what those would
be. You should look into point #1 first. Point #2 is significantly
more difficult to solve in a way that would be acceptable to the
project.

David

[1] https://www.postgresql.org/message-id/11535.1230501658%40sss.pgh.pa.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2023-01-03 03:08:36 Re: [PATCH] Improve ability to display optimizer analysis using OPTIMIZER_DEBUG
Previous Message vignesh C 2023-01-03 02:46:43 Re: CFM for 2023-01