Re: Todo: Teach planner to evaluate multiple windows in the optimal order

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com>
Cc: pghackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Vik Fearing <vik(at)postgresfriends(dot)org>
Subject: Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Date: 2023-01-08 22:18:58
Message-ID: CAApHDvpigtPVJOJ4EmWOAODtPTccz=L_gdPXDVgG2Z=iSPfNmQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 9 Jan 2023 at 06:17, Ankit Kumar Pandey <itsankitkp(at)gmail(dot)com> wrote:
> I have addressed all points 1-6 in the attached patch.

A few more things:

1. You're still using the 'i' variable in the foreach loop.
foreach_current_index() will work.

2. I think the "index" variable needs a better name. sort_pushdown_idx maybe.

3. I don't think you need to set "index" on every loop. Why not just
set it to foreach_current_index(l) - 1; break;

4. You're still setting orderby_pathkeys in the foreach loop. That's
already been set above and it won't have changed.

5. I don't think there's any need to check pathkeys_contained_in() in
the foreach loop anymore. With #3 the index will be -1 if the
optimisation cannot apply. You could likely also get rid of
try_sort_pushdown too and just make the condition "if
(sort_pushdown_idx == foreach_current_index(l))". I'm a little unsure
why there's still the is_sorted check there. Shouldn't that always be
false now that you're looping until the pathkeys don't match in the
foreach_reverse loop?

Correct me if I'm wrong as I've not tested, but I think the new code
in the foreach loop can just become:

if (sort_pushdown_idx == foreach_current_index(l))
{
Assert(!is_sorted);
window_pathkeys = orderby_pathkeys;
is_sorted = pathkeys_count_contained_in(window_pathkeys,
path->pathkeys, &presorted_keys);
}

> I have one doubt regarding runCondition, do we only need to ensure
> that window function which has subset sort clause of main query should
> not have runCondition or none of the window functions should not contain
> runCondition? I have gone with later case but wanted to clarify.

Actually, maybe it's ok just to check the top-level WindowClause for
runConditions. It's only that one that'll filter rows. That probably
simplifies the code quite a bit. Lower-level runConditions only serve
to halt the evaluation of WindowFuncs when the runCondition is no
longer met.

>
>
> > Also, do you want to have a go at coding up the sort bound pushdown
> > too? It'll require removing the limitCount restriction and adjusting
> > ExecSetTupleBound() to recurse through a WindowAggState. I think it's
> > pretty easy. You could try it then play around with it to make sure it
> > works and we get the expected performance.
>
> I tried this in the patch but kept getting `retrieved too many tuples in
> a bounded sort`.
>
> Added following code in ExecSetTupleBound which correctly found sortstate
>
> and set bound value.
>
> else if(IsA(child_node, WindowAggState))
>
> {
>
> WindowAggState *winstate = (WindowAggState *) child_node;
>
> if (outerPlanState(winstate))
>
> ExecSetTupleBound(tuples_needed, outerPlanState(winstate));
>
> }
>
> I think problem is that are not using limit clause inside window
> function (which
> may need to scan all tuples) so passing bound value to
> WindowAggState->sortstate
> is not working as we might expect. Or maybe I am getting it wrong? I was
> trying to
> have top-N sort for limit clause if orderby pushdown happens.

hmm, perhaps the Limit would have to be put between the WindowAgg and
Sort for it to work. Maybe that's more complexity than it's worth.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-01-08 22:39:19 Re: Fixing a couple of buglets in how VACUUM sets visibility map bits
Previous Message Alexander Korotkov 2023-01-08 22:07:45 Re: POC: Lock updated tuples in tuple_update() and tuple_delete()