Re: Window Function "Run Conditions"

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Zhihong Yu <zyu(at)yugabyte(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Window Function "Run Conditions"
Date: 2022-04-07 02:36:37
Message-ID: CAApHDvpou0r+AOT9JZZEsiANQY=w=BTiRXgWR230NzOG7D3VUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 6 Apr 2022 at 00:59, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
>
> On Tue, Apr 5, 2022 at 7:49 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>> Yeah, there is more performance to be had than even what you've done
>> there. There's no reason really for spool_tuples() to do
>> tuplestore_puttupleslot() when we're not in run mode.
>
>
> Yeah, this is a great idea.

I've attached an updated patch that does most of what you mentioned.
To make this work I had to add another state to the WindowAggStatus.
This new state is what the top-level WindowAgg will move into when
there's a PARTITION BY clause and the run condition becomes false.
The new state is named WINDOWAGG_PASSTHROUGH_STRICT, which does all
that WINDOWAGG_PASSTHROUGH does plus skips tuplestoring tuples during
the spool. We must still spool those tuples when we're not the
top-level WindowAgg so that we can send those out to any calling
WindowAgg nodes. They'll need those so they return the correct result.

This means that for intermediate WindowAgg nodes, when the
runcondition becomes false, we only skip evaluation of WindowFuncs.
WindowAgg nodes above us cannot reference these, so there's no need to
evaluate them, plus, if there's a run condition then these tuples will
be filtered out in the final WindowAgg node.

For the top-level WindowAgg node, when the run condition becomes false
we can save quite a bit more work. If there's no PARTITION BY clause,
then we're done. Just return NULL. When there is a PARTITION BY
clause we move into WINDOWAGG_PASSTHROUGH_STRICT which allows us to
skip both the evaluation of WindowFuncs and also allows us to consume
tuples from our outer plan until we get a tuple belonging to another
partition. No need to tuplestore these tuples as they're being
filtered out.

Since intermediate WindowAggs cannot filter tuples, all the filtering
must occur in the top-level WindowAgg. This cannot be done by way of
the run condition as the run condition is special as when it becomes
false, we don't check again to see if it's become true. A sort node
between the WindowAggs can change the tuple order (i.e previously
monotonic values may no longer be monotonic) so it's only valid to
evaluate the run condition that's meant for the WindowAgg node it was
intended for. To filter out the tuples that don't match the run
condition from intermediate WindowAggs in the top-level WindowAgg,
what I've done is introduced quals for WindowAgg nodes. This means
that we can now see Filter in EXPLAIN For WindowAgg and "Rows Removed
by Filter".

Why didn't I just do the filtering in the outer query like was
happening before? The problem is that when we push the quals down
into the subquery, we don't yet have knowledge of which order that the
WindowAggs will be evaluated in. Only run conditions from
intermediate WindowAggs will ever make it into the Filter, and we
don't know which one the top-level WindowAgg will be until later in
planning. To do the filtering in the outer query we'd need to push
quals back out the subquery again. It seems to me to be easier and
better to filter them out lower down in the plan.

Since the top-level WindowAgg node can now filter tuples, the executor
node had to be given a for(;;) loop so that it goes around again for
another tuple after it filters a tuple out.

I've also updated the commit message which I think I've made quite
clear about what we optimise and how it's done.

> And I would suggest the below fastpath for this feature.
> - if (check_and_push_window_quals(subquery, rte, rti, clause))
> + if (!subquery->hasWindowFuncs || check_and_push_window_quals(subquery, rte, rti, clause))

Good idea. Thanks!

David

Attachment Content-Type Size
v6-0001-Teach-planner-and-executor-about-monotonic-window.patch text/plain 78.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-04-07 02:55:45 Re: Skipping logical replication transactions on subscriber side
Previous Message Andres Freund 2022-04-07 01:58:52 Re: shared-memory based stats collector - v70