Re: Todo: Teach planner to evaluate multiple windows in the optimal order

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
Cc: pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Vik Fearing <vik(at)postgresfriends(dot)org>
Subject: Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Date: 2023-01-11 00:48:16
Message-ID: CAApHDvowcPrNY3kgAPhbFL0nZ1fCvUBcp3y1AjLm0TUP5N3cxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 11 Jan 2023 at 08:17, Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com> wrote:
>
>
> > On 10/01/23 10:53, David Rowley wrote:
>
> > the total cost is the same for both of these, but the execution time
> > seems to vary quite a bit.
>
> This is really weird, I tried it different ways (to rule out any issues
> due to
>
> caching) and execution time varied in spite of having same cost.
>
> > Maybe we should try and do this for DISTINCT queries if the
> > distinct_pathkeys match the orderby_pathkeys. That seems a little less
> > copout-ish. If the ORDER BY is the same as the DISTINCT then it seems
> > likely that the ORDER BY might opt to use the Unique path for DISTINCT
> > since it'll already have the correct pathkeys.
>
> > However, if the ORDER BY has fewer columns then it might be cheaper to Hash Aggregate and
> > then sort all over again, especially so when the DISTINCT removes a
> > large proportion of the rows.
>
> Isn't order by pathkeys are always fewer than distinct pathkeys?

Just thinking about this again, I remembered why I thought DISTINCT
was uninteresting to start with. The problem is that if the query has
WindowFuncs and also has a DISTINCT clause, then the WindowFunc
results *must* be in the DISTINCT clause and, optionally also in the
ORDER BY clause. There's no other place to write WindowFuncs IIRC.
Since we cannot pushdown the sort when the more strict version of the
pathkeys have WindowFuncs, then we must still perform the additional
sort if the planner chooses to do a non-hashed DISTINCT. The aim of
this patch is to reduce the total number of sorts, and I don't think
that's possible in this case as you can't have WindowFuncs in the
ORDER BY when they're not in the DISTINCT clause:

postgres=# select distinct relname from pg_Class order by row_number()
over (order by oid);
ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 1: select distinct relname from pg_Class order by row_number() ...

Another type of query which is suboptimal still is when there's a
DISTINCT and WindowClause but no ORDER BY. We'll reorder the DISTINCT
clause so that the leading columns of the ORDER BY come first in
transformDistinctClause(), but we've nothing to do the same for
WindowClauses. It can't happen around when transformDistinctClause()
is called as we've yet to decide the WindowClause evaluation order,
so if we were to try to make that better it would maybe have to do in
the upper planner somewhere. It's possible it's too late by that time
to adjust the DISTINCT clause.

Here's an example of it.

# set enable_hashagg=0;
# explain select distinct relname,relkind,count(*) over (partition by
relkind) from pg_Class;
QUERY PLAN
------------------------------------------------------------------------------------
Unique (cost=61.12..65.24 rows=412 width=73)
-> Sort (cost=61.12..62.15 rows=412 width=73)
Sort Key: relname, relkind, (count(*) OVER (?))
-> WindowAgg (cost=36.01..43.22 rows=412 width=73)
-> Sort (cost=36.01..37.04 rows=412 width=65)
Sort Key: relkind
-> Seq Scan on pg_class (cost=0.00..18.12
rows=412 width=65)
(7 rows)

We can simulate the optimisation by swapping the column order in the
targetlist. Note the planner can use Incremental Sort (at least since
3c6fc5820, from about 2 hours ago)

# explain select distinct relkind,relname,count(*) over (partition by
relkind) from pg_Class;
QUERY PLAN
------------------------------------------------------------------------------------
Unique (cost=41.26..65.32 rows=412 width=73)
-> Incremental Sort (cost=41.26..62.23 rows=412 width=73)
Sort Key: relkind, relname, (count(*) OVER (?))
Presorted Key: relkind
-> WindowAgg (cost=36.01..43.22 rows=412 width=73)
-> Sort (cost=36.01..37.04 rows=412 width=65)
Sort Key: relkind
-> Seq Scan on pg_class (cost=0.00..18.12
rows=412 width=65)
(8 rows)

Not sure if we should be trying to improve that in this patch. I just
wanted to identify it as something else that perhaps could be done.
I'm not really all that sure the above query shape makes much sense in
the real world. Would anyone ever want to use DISTINCT on some results
containing WindowFuncs?

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2023-01-11 00:54:04 Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Previous Message Peter Geoghegan 2023-01-11 00:39:30 Re: Fixing a couple of buglets in how VACUUM sets visibility map bits