Todo: Teach planner to evaluate multiple windows in the optimal order

From: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
To: pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Todo: Teach planner to evaluate multiple windows in the optimal order
Date: 2022-12-25 13:04:38
Message-ID: 83d80853-a45c-d85c-68eb-59acfe7fb5fb@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

While looking at one of the todo item in Window function, namely:

/Teach planner to evaluate multiple windows in the optimal order
Currently windows are always evaluated in the query-specified order./

From threads, relevant points.

Point #1

> In the above query Oracle 10g performs 2 sorts, DB2 and Sybase perform 3
> sorts. We also perform 3.
and Point #2
> Teach planner to decide which window to evaluate first based on costs.
> Currently the first window in the query is evaluated first, there may
> be no
> index to help sort the first window, but perhaps there are for other
> windows
> in the query. This may allow an index scan instead of a seqscan -> sort.
Repro:

select pg_catalog.version();

/version //
//----------------------------------------------------------------------------------------------------//
// PostgreSQL 16devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
12.2.0-3ubuntu1) 12.2.0, 64-bit//
//(1 row)/

create table empsalary(depname text, empno int, salary int);
insert into empsalary values (select substr(md5(random()::text), 0, 25),
generate_series(1,10), generate_series(10000,12000));

explain SELECT depname, SUM(salary) OVER (ORDER BY salary), SUM(salary)
OVER (ORDER BY empno) FROM empsalary ORDER BY salary;

/                                      QUERY PLAN //
//--------------------------------------------------------------------------------------//
// WindowAgg  (cost=289.47..324.48 rows=2001 width=49)//
//   ->  Sort  (cost=289.47..294.47 rows=2001 width=41)//
//         Sort Key: salary//
//         ->  WindowAgg  (cost=144.73..179.75 rows=2001 width=41)//
//               ->  Sort  (cost=144.73..149.73 rows=2001 width=33)//
//                     Sort Key: empno//
//                     ->  Seq Scan on empsalary (cost=0.00..35.01
rows=2001 width=33)//
//(7 rows)/

As it is seen, for case #1, issue looks like resolved and only 2 sorts
are performed.

For #2, index col ordering is changed.

create index idx_emp on empsalary (empno);
explain SELECT depname, SUM(salary) OVER (ORDER BY salary), SUM(salary)
OVER (ORDER BY empno) FROM empsalary ORDER BY salary;
/                                           QUERY PLAN //
//------------------------------------------------------------------------------------------------//
// WindowAgg  (cost=204.03..239.04 rows=2001 width=49)//
//   ->  Sort  (cost=204.03..209.03 rows=2001 width=41)//
//         Sort Key: salary//
//         ->  WindowAgg  (cost=0.28..94.31 rows=2001 width=41)//
//               ->  Index Scan using idx_emp on empsalary 
(cost=0.28..64.29 rows=2001 width=33)//
//(5 rows)/

explain SELECT depname, SUM(salary) OVER (ORDER BY empno), SUM(salary)
OVER (ORDER BY salary) FROM empsalary ORDER BY salary;
/                                           QUERY PLAN //
//------------------------------------------------------------------------------------------------//
// WindowAgg  (cost=204.03..239.04 rows=2001 width=49)//
//   ->  Sort  (cost=204.03..209.03 rows=2001 width=41)//
//         Sort Key: salary//
//         ->  WindowAgg  (cost=0.28..94.31 rows=2001 width=41)//
//               ->  Index Scan using idx_emp on empsalary 
(cost=0.28..64.29 rows=2001 width=33)//
//(5 rows)/

In both cases, index scan is performed, which means this issue is
resolved as well.

Is this todo still relevant?

Further down the threads:

> I do think the patch has probably left some low-hanging fruit on the
> simpler end of the difficulty spectrum, namely when the window stuff
> requires only one ordering that could be done either explicitly or
> by an indexscan. That choice should ideally be done with a proper
> cost comparison taking any LIMIT into account. I think right now
> the LIMIT might not be accounted for, or might be considered even
> when it shouldn't be because another sort is needed anyway.

I am not sure if I understand this fully but does it means proposed todo
(to use indexes) should be refined to

teach planner to  take into account of cost model while deciding to use
index or not in window functions? Meaning not always go with index route
(modify point #2)?

--
Regards,
Ankit Kumar Pandey

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ankit Kumar Pandey 2022-12-25 16:20:16 [PATCH] Improve ability to display optimizer analysis using OPTIMIZER_DEBUG
Previous Message Alvaro Herrera 2022-12-25 13:01:36 Re: daitch_mokotoff module