Re: Todo: Teach planner to evaluate multiple windows in the optimal order

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
Cc: pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Vik Fearing <vik(at)postgresfriends(dot)org>
Subject: Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Date: 2023-01-07 22:26:08
Message-ID: CAApHDvp8BpUEo_kQdGHWNPCjcmRWCdiy5p26SoQA4R6rinkaLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(your email client still seems broken)

On Sun, 8 Jan 2023 at 05:27, Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com> wrote:
>
>
> While writing test cases, I found that optimization do not happen for
> case #1
>
> (which is prime candidate for such operation) like
>
> EXPLAIN (COSTS OFF)
> SELECT empno,
> depname,
> min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,
> sum(salary) OVER (PARTITION BY depname) depsalary
> FROM empsalary
> ORDER BY depname, empno, enroll_date
>
> This happens because mutual exclusiveness of two operands (when number
> of window functions > 1) viz
>
> is_sorted and last activeWindow in the condition:
>
> ( !is_sorted && lnext(activeWindows, l) == NULL)
>
> For 2nd last window function, is_sorted is false and path keys get added.
>
> In next run (for last window function), is_sorted becomes true and whole
> optimization
>
> part is skipped.
>
> Note: Major issue that if I remove is_sorted from condition, even though
>
> path keys are added, it still do not perform optimization and works same
> as in master/unoptimized case.
>
> Perhaps adding path keys at last window function is not doing trick?
> Maybe we need to add pathkeys
>
> to all window functions which are subset of query's order by
> irrespective of being last or not?

You might need to have another loop before the foreach loop that loops
backwards through the WindowClauses and remembers the index of the
WindowClause which has pathkeys contained in the query's ORDER BY
pathkeys then apply the optimisation from that point in the main
foreach loop. Also, if the condition within the foreach loop which
checks when we want to apply this optimisation is going to be run > 1
time, then you should probably have boolean variable that's set
before the loop which saves if we're going to try to apply the
optimisation. That'll save from having to check things like if the
query has a LIMIT clause multiple times.

> Case #2:
>
> For presorted columns, eg
>
> CREATE INDEX depname_idx ON empsalary(depname);
> SET enable_seqscan=0;
> EXPLAIN (COSTS OFF)
> SELECT empno,
> min(salary) OVER (PARTITION BY depname) depminsalary
> FROM empsalary
> ORDER BY depname, empno;
>
> Is this correct plan:
>
> a)
>
> QUERY PLAN
> -------------------------------------------------------
> Incremental Sort
> Sort Key: depname, empno
> Presorted Key: depname
> -> WindowAgg
> -> Index Scan using depname_idx on empsalary
> (5 rows)
>
> or this:
>
> b) (Akin to Optimized version)
>
> QUERY PLAN
> -------------------------------------------------------
> WindowAgg
> -> Incremental Sort
> Sort Key: depname, empno
> Presorted Key: depname
> -> Index Scan using depname_idx on empsalary
> (5 rows)
>
> Patched version does (a) because of is_sorted condition.

a) looks like the best plan to me. What's the point of pushing the
sort below the WindowAgg in this case? The point of this optimisation
is to reduce the number of sorts not to push them as deep into the
plan as possible. We should only be pushing them down when it can
reduce the number of sorts. There's no reduction in the number of
sorts in the above plan.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2023-01-07 22:36:58 Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Previous Message Tom Lane 2023-01-07 22:17:32 Re: delayed initialization in worktable scan