Re: Todo: Teach planner to evaluate multiple windows in the optimal order

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
Cc: pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Vik Fearing <vik(at)postgresfriends(dot)org>
Subject: Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Date: 2023-01-10 05:23:50
Message-ID: CAApHDvo2y9S2AO-BPYo7gMPYD0XE2Lo-KFLnqX80fcftqBCcyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 10 Jan 2023 at 06:15, Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com> wrote:
> Do we have any pending items for this patch now?

I'm just wondering if not trying this when the query has a DISTINCT
clause is a copout. What I wanted to avoid was doing additional
sorting work for WindowAgg just to have it destroyed by Hash
Aggregate. I'm now wondering if adding both the original
slightly-less-sorted path plus the new slightly-more-sorted path then
if distinct decides to Hash Aggregate then it'll still be able to pick
the cheapest input path to do that on. Unfortunately, our sort
costing just does not seem to be advanced enough to know that sorting
by fewer columns might be cheaper, so adding the additional path is
likely just going to result in add_path() ditching the old
slightly-less-sorted path due to the new slightly-more-sorted path
having better pathkeys. So, we'd probably be wasting our time if we
added both paths with the current sort costing code.

# explain analyze select * from pg_Class order by relkind,relname;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Sort (cost=36.01..37.04 rows=412 width=273) (actual
time=0.544..0.567 rows=412 loops=1)
Sort Key: relkind, relname
Sort Method: quicksort Memory: 109kB
-> Seq Scan on pg_class (cost=0.00..18.12 rows=412 width=273)
(actual time=0.014..0.083 rows=412 loops=1)
Planning Time: 0.152 ms
Execution Time: 0.629 ms
(6 rows)

# explain analyze select * from pg_Class order by relkind;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Sort (cost=36.01..37.04 rows=412 width=273) (actual
time=0.194..0.218 rows=412 loops=1)
Sort Key: relkind
Sort Method: quicksort Memory: 109kB
-> Seq Scan on pg_class (cost=0.00..18.12 rows=412 width=273)
(actual time=0.014..0.083 rows=412 loops=1)
Planning Time: 0.143 ms
Execution Time: 0.278 ms
(6 rows)

the total cost is the same for both of these, but the execution time
seems to vary quite a bit.

Maybe we should try and do this for DISTINCT queries if the
distinct_pathkeys match the orderby_pathkeys. That seems a little less
copout-ish. If the ORDER BY is the same as the DISTINCT then it seems
likely that the ORDER BY might opt to use the Unique path for DISTINCT
since it'll already have the correct pathkeys. However, if the ORDER
BY has fewer columns then it might be cheaper to Hash Aggregate and
then sort all over again, especially so when the DISTINCT removes a
large proportion of the rows.

Ideally, our sort costing would just be better, but I think that
raises the bar a little too high to start thinking of making
improvements to that for this patch.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Brar Piening 2023-01-10 05:28:10 Re: doc: add missing "id" attributes to extension packaging page
Previous Message Tom Lane 2023-01-10 05:22:53 Re: doc: add missing "id" attributes to extension packaging page