Re: Todo: Teach planner to evaluate multiple windows in the optimal order

From: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Vik Fearing <vik(at)postgresfriends(dot)org>
Subject: Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Date: 2023-01-08 17:17:19
Message-ID: 8d0b24ce-f8ba-cd63-82ae-be853b4067e8@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On 08/01/23 16:33, David Rowley wrote:

> On Sun, 8 Jan 2023 at 23:21, Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com> wrote:
>> Please find attached patch with addressed issues mentioned before.
>
> Here's a quick review:
>
> 1. You can use foreach_current_index(l) to obtain the index of the list element.
>
> 2. I'd rather see you looping backwards over the list in the first
> loop. I think it'll be more efficient to loop backwards as you can
> just break out the loop when you see the pathkeys are not contained in
> the order by pathkreys. When the optimisation does not apply that
> means you only need to look at the last item in the list. You could
> maybe just invent foreach_reverse() for this purpose and put it in
> pg_list.h. That'll save having to manually code up the loop.
>
> 3. I don't think you should call the variable
> enable_order_by_pushdown. We've a bunch of planner related GUCs that
> start with enable_. That might cause a bit of confusion. Maybe just
> try_sort_pushdown.
>
> 4. You should widen the scope of orderby_pathkeys and set it within
> the if (enable_order_by_pushdown) scope. You can reuse this again in
> the 2nd loop too. Just initialise it to NIL
>
> 5. You don't seem to be checking all WindowClauses for a runCondtion.
> If you do #2 above then you can start that process in the initial
> reverse loop so that you've checked them all by the time you get
> around to that WindowClause again in the 2nd loop.
>
> 6. The test with "+-- Do not perform additional sort if column is
> presorted". I don't think "additional" is the correct word here. I
> think you want to ensure that we don't push down the ORDER BY below
> the WindowAgg for this case. There is no ability to reduce the sorts
> here, only move them around, which we agreed we don't want to do for
> this case.

I have addressed all points 1-6 in the attached patch.

I have one doubt regarding runCondition, do we only need to ensure

that window function which has subset sort clause of main query should

not have runCondition or none of the window functions should not contain

runCondition? I have gone with later case but wanted to clarify.

> Also, do you want to have a go at coding up the sort bound pushdown
> too? It'll require removing the limitCount restriction and adjusting
> ExecSetTupleBound() to recurse through a WindowAggState. I think it's
> pretty easy. You could try it then play around with it to make sure it
> works and we get the expected performance.

I tried this in the patch but kept getting `retrieved too many tuples in
a bounded sort`.

Added following code in ExecSetTupleBound which correctly found sortstate

and set bound value.

else if(IsA(child_node, WindowAggState))

{

WindowAggState *winstate = (WindowAggState *) child_node;

if (outerPlanState(winstate))

ExecSetTupleBound(tuples_needed, outerPlanState(winstate));

}

I think problem is that are not using limit clause inside window
function (which

may need to scan all tuples) so passing bound value to
WindowAggState->sortstate

is not working as we might expect. Or maybe I am getting it wrong? I was
trying to

have top-N sort for limit clause if orderby pushdown happens.

> You'll likely want to add a few more rows than the last performance test you did or run the query
> with pgbench. Running a query once that only takes 1-2ms is likely not
> a reliable way to test the performance of something.

I did some profiling.

CREATE TABLE empsalary1 (
depname varchar,
empno bigint,
salary int,
enroll_date date
);
INSERT INTO empsalary1(depname, empno, salary, enroll_date)
SELECT string_agg (substr('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', ceil (random() * 62)::integer, 1000), '')
AS depname, generate_series(1, 10000000) AS empno, ceil (random()*10000)::integer AS salary
, NOW() + (random() * (interval '90 days')) + '30 days' AS enroll_date;

1) Optimization case

EXPLAIN (ANALYZE)
SELECT empno,
depname,
min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
sum(salary) OVER (PARTITION BY depname) depsalary,
count(*) OVER (ORDER BY enroll_date DESC) c
FROM empsalary1
ORDER BY depname, empno, enroll_date;

EXPLAIN (ANALYZE)
SELECT empno,
depname,
min(salary) OVER (PARTITION BY depname) depminsalary
FROM empsalary1
ORDER BY depname, empno;

In patched version:

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
-----------------------
WindowAgg (cost=93458.04..93458.08 rows=1 width=61) (actual time=149996.006..156756.995 rows=10000000 loops=1)
-> WindowAgg (cost=93458.04..93458.06 rows=1 width=57) (actual time=108559.126..135892.188 rows=10000000 loops=1)
-> Sort (cost=93458.04..93458.04 rows=1 width=53) (actual time=108554.213..112564.168 rows=10000000 loops=1)
Sort Key: depname, empno, enroll_date
Sort Method: external merge Disk: 645856kB
-> WindowAgg (cost=93458.01..93458.03 rows=1 width=53) (actual time=30386.551..62357.669 rows=10000000 lo
ops=1)
-> Sort (cost=93458.01..93458.01 rows=1 width=45) (actual time=23260.104..26313.395 rows=10000000 l
oops=1)
Sort Key: enroll_date DESC
Sort Method: external merge Disk: 528448kB
-> Seq Scan on empsalary1 (cost=0.00..93458.00 rows=1 width=45) (actual time=0.032..4833.603
rows=10000000 loops=1)
Planning Time: 4.693 ms
Execution Time: 158161.281 ms

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
-------------
WindowAgg (cost=1903015.63..2078015.74 rows=10000006 width=39) (actual time=40565.305..46598.984 rows=10000000 loops=1)
-> Sort (cost=1903015.63..1928015.65 rows=10000006 width=39) (actual time=23411.837..27467.962 rows=10000000 loops=1)
Sort Key: depname, empno
Sort Method: external merge Disk: 528448kB
-> Seq Scan on empsalary1 (cost=0.00..193458.06 rows=10000006 width=39) (actual time=5.095..5751.675 rows=10000
000 loops=1)
Planning Time: 0.099 ms
Execution Time: 47415.926 ms

In master:

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
-------------------------------------
Incremental Sort (cost=3992645.36..4792645.79 rows=10000006 width=59) (actual time=147130.132..160985.373 rows=10000000
loops=1)
Sort Key: depname, empno, enroll_date
Presorted Key: depname, empno
Full-sort Groups: 312500 Sort Method: quicksort Average Memory: 28kB Peak Memory: 28kB
-> WindowAgg (cost=3992645.31..4342645.52 rows=10000006 width=59) (actual time=147129.936..154023.147 rows=10000000 l
oops=1)
-> WindowAgg (cost=3992645.31..4192645.43 rows=10000006 width=55) (actual time=104665.289..133089.188 rows=1000
0000 loops=1)
-> Sort (cost=3992645.31..4017645.33 rows=10000006 width=51) (actual time=104665.257..108710.282 rows=100
00000 loops=1)
Sort Key: depname, empno
Sort Method: external merge Disk: 645856kB
-> WindowAgg (cost=1971370.63..2146370.74 rows=10000006 width=51) (actual time=28314.300..59737.949
rows=10000000 loops=1)
-> Sort (cost=1971370.63..1996370.65 rows=10000006 width=43) (actual time=21190.188..24098.59
6 rows=10000000 loops=1)
Sort Key: enroll_date DESC
Sort Method: external merge Disk: 528448kB
-> Seq Scan on empsalary1 (cost=0.00..193458.06 rows=10000006 width=43) (actual time=0.
630..5317.862 rows=10000000 loops=1)
Planning Time: 0.982 ms
Execution Time: 163369.242 ms
(16 rows)

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
-------------------
Incremental Sort (cost=3787573.31..3912573.41 rows=10000006 width=39) (actual time=51547.195..53950.034 rows=10000000 lo
ops=1)
Sort Key: depname, empno
Presorted Key: depname
Full-sort Groups: 1 Sort Method: quicksort Average Memory: 30kB Peak Memory: 30kB
Pre-sorted Groups: 1 Sort Method: external merge Average Disk: 489328kB Peak Disk: 489328kB
-> WindowAgg (cost=1903015.63..2078015.74 rows=10000006 width=39) (actual time=33413.954..39771.262 rows=10000000 loo
ps=1)
-> Sort (cost=1903015.63..1928015.65 rows=10000006 width=39) (actual time=18991.129..21992.353 rows=10000000 lo
ops=1)
Sort Key: depname
Sort Method: external merge Disk: 528456kB
-> Seq Scan on empsalary1 (cost=0.00..193458.06 rows=10000006 width=39) (actual time=1.300..5269.729 rows
=10000000 loops=1)
Planning Time: 4.506 ms
Execution Time: 54768.697 ms

2) No optimization case:

EXPLAIN (ANALYZE)
SELECT empno,
depname,
min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary
FROM empsalary1
ORDER BY enroll_date;

Patch:

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
--------------------------------
 Sort  (cost=3968191.79..3993191.80 rows=10000006 width=43) (actual
time=57863.173..60976.324 rows=10000000 loops=1)
   Sort Key: enroll_date
   Sort Method: external merge  Disk: 528448kB
   ->  WindowAgg  (cost=850613.62..2190279.21 rows=10000006 width=43)
(actual time=7478.966..42502.541 rows=10000000 loops
=1)
         ->  Gather Merge  (cost=850613.62..2015279.11 rows=10000006
width=43) (actual time=7478.935..18037.001 rows=10000
000 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Sort  (cost=849613.60..860030.27 rows=4166669
width=43) (actual time=7349.101..9397.713 rows=3333333 lo
ops=3)
                     Sort Key: depname, empno
                     Sort Method: external merge  Disk: 181544kB
                     Worker 0:  Sort Method: external merge  Disk: 169328kB
                     Worker 1:  Sort Method: external merge  Disk: 177752kB
                     ->  Parallel Seq Scan on empsalary1
(cost=0.00..135124.69 rows=4166669 width=43) (actual time=0.213.
.2450.635 rows=3333333 loops=3)
 Planning Time: 0.100 ms
 Execution Time: 63341.783 ms

master:

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
--------------------------------
 Sort  (cost=3968191.79..3993191.80 rows=10000006 width=43) (actual
time=54097.880..57000.806 rows=10000000 loops=1)
   Sort Key: enroll_date
   Sort Method: external merge  Disk: 528448kB
   ->  WindowAgg  (cost=850613.62..2190279.21 rows=10000006 width=43)
(actual time=7075.245..39200.756 rows=10000000 loops
=1)
         ->  Gather Merge  (cost=850613.62..2015279.11 rows=10000006
width=43) (actual time=7075.217..15988.922 rows=10000
000 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Sort  (cost=849613.60..860030.27 rows=4166669
width=43) (actual time=6993.974..8799.701 rows=3333333 lo
ops=3)
                     Sort Key: depname, empno
                     Sort Method: external merge  Disk: 171904kB
                     Worker 0:  Sort Method: external merge  Disk: 178496kB
                     Worker 1:  Sort Method: external merge  Disk: 178224kB
                     ->  Parallel Seq Scan on empsalary1
(cost=0.00..135124.69 rows=4166669 width=43) (actual time=0.044.
.2683.598 rows=3333333 loops=3)
 Planning Time: 5.718 ms
 Execution Time: 59188.469 ms
(15 rows)

Master and patch have same performance as plan is same.

pgbench (this is to find average performance):

create table empsalary2 as select * from empsalary1 limit 1000;
-------------------------------------------------------------------
test.sql
SELECT empno,
depname,
min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
sum(salary) OVER (PARTITION BY depname) depsalary,
count(*) OVER (ORDER BY enroll_date DESC) c
FROM empsalary2
ORDER BY depname, empno, enroll_date;

SELECT empno,
depname,
min(salary) OVER (PARTITION BY depname) depminsalary
FROM empsalary2
ORDER BY depname, empno;
----------------------------------------------------------------------

/usr/local/pgsql/bin/pgbench -d test -c 10 -j 4 -t 1000 -f test.sql

Patch:

transaction type: test.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 4
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 55.262 ms
initial connection time = 8.480 ms
tps = 180.957685 (without initial connection time)

Master:

transaction type: test.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 4
maximum number of tries: 1
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
number of failed transactions: 0 (0.000%)
latency average = 60.489 ms
initial connection time = 7.069 ms
tps = 165.320205 (without initial connection time)

TPS of patched version is higher than that of master's for same set of
queries.

where optimization is performed.

--
Regards,
Ankit Kumar Pandey

Attachment Content-Type Size
better_windowclause_sorting_dgr-v4.patch text/x-patch 14.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mason Sharp 2023-01-08 18:33:40 Re: POC: Lock updated tuples in tuple_update() and tuple_delete()
Previous Message Ankit Kumar Pandey 2023-01-08 17:05:08 Re: Todo: Teach planner to evaluate multiple windows in the optimal order