Re: Todo: Teach planner to evaluate multiple windows in the optimal order

From: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Vik Fearing <vik(at)postgresfriends(dot)org>
Subject: Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Date: 2023-01-09 08:34:08
Message-ID: b1df021b-e6c0-f26a-1d5a-bd07f765db62@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On 09/01/23 03:48, David Rowley wrote:

> On Mon, 9 Jan 2023 at 06:17, Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com> wrote:
>> I have addressed all points 1-6 in the attached patch.

> A few more things:

> 1. You're still using the 'i' variable in the foreach loop.
foreach_current_index() will work.

> 2. I think the "index" variable needs a better name. sort_pushdown_idx maybe.

Done these (1 & 2)

> 3. I don't think you need to set "index" on every loop. Why not just
set it to foreach_current_index(l) - 1; break;

Consider this query

EXPLAIN (COSTS OFF)

SELECT empno,

depname,

min(salary) OVER (PARTITION BY depname ORDER BY empno) depminsalary,

sum(salary) OVER (PARTITION BY depname) depsalary,

count(*) OVER (ORDER BY enroll_date DESC) c

FROM empsalary

ORDER BY depname, empno, enroll_date;

Here, W1 = min(salary) OVER (PARTITION BY depname ORDER BY empno) W2 =
sum(salary) OVER (PARTITION BY depname)

W3 = count(*) OVER (ORDER BY enroll_date DESC)

(1,2,3 are winref).

activeWindows = [W3, W1, W2]

If we iterate from reverse and break at first occurrence, we will

break at W2 and add extra keys there, but what we want it to add

keys at W1 so that it gets spilled to W2 (as existing logic is designed to

carry over sorted cols first to last).

> 4. You're still setting orderby_pathkeys in the foreach loop. That's
already been set above and it won't have changed.

> 5. I don't think there's any need to check pathkeys_contained_in() in
the foreach loop anymore. With #3 the index will be -1 if the
optimisation cannot apply. You could likely also get rid of
try_sort_pushdown too and just make the condition "if
(sort_pushdown_idx == foreach_current_index(l))".

Done this.

Added pathkeys_contained_in as an assert, hope that's okay.

> I'm a little unsure why there's still the is_sorted check there.
Shouldn't that always be false now that you're looping until the pathkeys
don't match in the foreach_reverse loop?

Removing is_sorted causes issue if there is matching pathkey which is
presorted

eg this case

-- Do not perform sort pushdown if column is presorted
CREATE INDEX depname_idx ON empsalary(depname);
SET enable_seqscan=0;

EXPLAIN (COSTS OFF)
SELECT empno,
min(salary) OVER (PARTITION BY depname) depminsalary
FROM empsalary
ORDER BY depname, empno;

We can move this to if (try_sort_pushdown) block but it looks to me bit
ugly.

Nevertheless, it make sense to have it here, sort_pushdown_index should
point to exact

window function which needs to be modified. Having extra check (for
is_sorted) in 2nd foreach loop

adds ambiguity if we don't add it in first check.

foreach_reverse(l, activeWindows)
{
WindowClause *wc = lfirst_node(WindowClause, l);
orderby_pathkeys = make_pathkeys_for_sortclauses(root,root->parse->sortClause,root->processed_tlist);
window_pathkeys = make_pathkeys_for_window(root,wc,root->processed_tlist);
is_sorted = pathkeys_count_contained_in(window_pathkeys,path->pathkeys,&presorted_keys);
has_runcondition |= (wc->runCondition != NIL);
if (!pathkeys_contained_in(window_pathkeys, orderby_pathkeys) || has_runcondition)
break;
if(!is_sorted)
sort_pushdown_idx = foreach_current_index(l);
}

Tests passes on this so logically it is ok.

> Correct me if I'm wrong as I've not tested, but I think the new code
> in the foreach loop can just become:
>
> if (sort_pushdown_idx == foreach_current_index(l))
> {
> Assert(!is_sorted);
> window_pathkeys = orderby_pathkeys;
> is_sorted = pathkeys_count_contained_in(window_pathkeys,
> path->pathkeys, &presorted_keys);
> }

Depending on where we have is_sorted (as mentioned above) it looks a lot
like you mentioned.

Also, we can add Assert(pathkeys_contained_in(window_pathkeys,
orderby_pathkeys))

>> I have one doubt regarding runCondition, do we only need to ensure
>> that window function which has subset sort clause of main query should
>> not have runCondition or none of the window functions should not contain
>> runCondition? I have gone with later case but wanted to clarify.

> Actually, maybe it's ok just to check the top-level WindowClause for
> runConditions. It's only that one that'll filter rows. That probably
> simplifies the code quite a bit. Lower-level runConditions only serve
> to halt the evaluation of WindowFuncs when the runCondition is no
> longer met.

Okay, then this approach makes sense.

> hmm, perhaps the Limit would have to be put between the WindowAgg and
> Sort for it to work. Maybe that's more complexity than it's worth.

Yes, not specific to this change. It is more around allowing top-N sort in

window functions (in general). Once we have it there, then this could be
taken care of.

I have attached patch which fixes 1 & 2 and rearrange is_sorted.

Point #3 needs to be resolved (and perhaps another way to handle is_sorted)

Thanks,

--
Regards,
Ankit Kumar Pandey

Attachment Content-Type Size
better_windowclause_sorting_dgr-v5.patch text/x-patch 14.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-01-09 08:36:12 Re: Common function for percent placeholder replacement
Previous Message Amit Kapila 2023-01-09 08:21:00 Re: Unwanted file mode modification?