Re: Todo: Teach planner to evaluate multiple windows in the optimal order

From: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Vik Fearing <vik(at)postgresfriends(dot)org>
Subject: Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Date: 2023-01-10 19:17:25
Message-ID: 2f9f632d-3297-09dd-1d01-af3e0961c446@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On 10/01/23 10:53, David Rowley wrote:

> the total cost is the same for both of these, but the execution time
> seems to vary quite a bit.

This is really weird, I tried it different ways (to rule out any issues
due to

caching) and execution time varied in spite of having same cost.

> Maybe we should try and do this for DISTINCT queries if the
> distinct_pathkeys match the orderby_pathkeys. That seems a little less
> copout-ish. If the ORDER BY is the same as the DISTINCT then it seems
> likely that the ORDER BY might opt to use the Unique path for DISTINCT
> since it'll already have the correct pathkeys.

> However, if the ORDER BY has fewer columns then it might be cheaper to Hash Aggregate and
> then sort all over again, especially so when the DISTINCT removes a
> large proportion of the rows.

Isn't order by pathkeys are always fewer than distinct pathkeys?

distinct pathkeys = order by pathkeys + window functions pathkeys

Again, I got your point which that it is okay to pushdown order by clause

if distinct is doing unique sort. But problem is (atleast from what I am
facing),

distinct is not caring about pushed down sortkeys, it goes with hashagg
or unique

with some other logic (mentioned below).

Consider following (with distinct clause restriction removed)

if (parse->distinctClause)
{
List* distinct_pathkeys = make_pathkeys_for_sortclauses(root, parse->distinctClause, root->processed_tlist);
if (!compare_pathkeys(distinct_pathkeys, orderby_pathkeys)==1) // distinct key > order by key
skip = true; // this is used to skip order by pushdown

}

CASE #1:

explain (costs off) select distinct a,b, min(a) over (partition by a), sum (a) over (partition by a) from abcd order by a,b;
QUERY PLAN
-----------------------------------------------------------
Sort
Sort Key: a, b
-> HashAggregate
Group Key: a, b, min(a) OVER (?), sum(a) OVER (?)
-> WindowAgg
-> Sort
Sort Key: a, b
-> Seq Scan on abcd
(8 rows)

explain (costs off) select distinct a,b,c, min(a) over (partition by a), sum (a) over (partition by a) from abcd order by a,b,c;
QUERY PLAN
--------------------------------------------------------------
Sort
Sort Key: a, b, c
-> HashAggregate
Group Key: a, b, c, min(a) OVER (?), sum(a) OVER (?)
-> WindowAgg
-> Sort
Sort Key: a, b, c
-> Seq Scan on abcd
(8 rows)

No matter how many columns are pushed down, it does hashagg.

On the other hand:

CASE #2:

EXPLAIN (costs off) SELECT DISTINCT depname, empno, min(salary) OVER (PARTITION BY depname) depminsalary,sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno;
QUERY PLAN
----------------------------------------------------------------------------------
Unique
-> Sort
Sort Key: depname, empno, (min(salary) OVER (?)), (sum(salary) OVER (?))
-> WindowAgg
-> Sort
Sort Key: depname, empno
-> Seq Scan on empsalary
(7 rows)

EXPLAIN (costs off) SELECT DISTINCT depname, empno, enroll_date, min(salary) OVER (PARTITION BY depname) depminsalary,sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Unique
-> Sort
Sort Key: depname, empno, enroll_date, (min(salary) OVER (?)), (sum(salary) OVER (?))
-> WindowAgg
-> Sort
Sort Key: depname, empno, enroll_date
-> Seq Scan on empsalary
(7 rows)

It keeps doing Unique.

In both of the cases, compare_pathkeys(distinct_pathkeys,
orderby_pathkeys) returns 1

Looking bit further, planner is choosing things correctly.

I could see cost of unique being higher in 1st case and lower in 2nd case.

But the point is, if sort for orderby is pushdown, shouldn't there be
some discount on

cost of Unique sort (so that there is more possibility of it being
favorable compared to HashAgg in certain cases)?

Again, cost of Unqiue node is taken as cost of sort node as it is, but

for HashAgg, new cost is being computed. If we do incremental sort here
(for unique node),

as we have pushed down orderby's, unique cost could be reduced and our
optimization could

be made worthwhile (I assume this is what you intended here) in case of
distinct.

Eg:

EXPLAIN SELECT DISTINCT depname, empno, enroll_date, min(salary) OVER (PARTITION BY depname) depminsalary,sum(salary) OVER (PARTITION BY depname) depsalary
FROM empsalary
ORDER BY depname, empno, enroll_date;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Unique  (cost=1.63..1.78 rows=10 width=56)
   ->  Sort  (cost=1.63..1.66 rows=10 width=56)
         Sort Key: depname, empno, enroll_date, (min(salary) OVER (?)), (sum(salary) OVER (?))
         ->  WindowAgg  (cost=1.27..1.47 rows=10 width=56)
               ->  Sort  (cost=1.27..1.29 rows=10 width=48)
                     Sort Key: depname, empno, enroll_date
                     ->  Seq Scan on empsalary  (cost=0.00..1.10 rows=10 width=48)

depname, empno, enroll_date are presorted but still strict sorting is
done on all columns.

Additionally,

> the total cost is the same for both of these, but the execution time
> seems to vary quite a bit.

Even if I pushdown one or two path keys, end result is same cost (which
isn't helping).

--
Regards,
Ankit Kumar Pandey

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Luzanov 2023-01-10 19:18:16 Re: psql: Add role's membership options to the \du+ command
Previous Message Andres Freund 2023-01-10 19:14:49 Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE