Re: Window Function "Run Conditions"

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Window Function "Run Conditions"
Date: 2022-04-07 03:45:56
Message-ID: CALNJ-vTaG0i+Sx4bruXDKNnppfMJS0nt7E=7Roo7VXAuxZFWiw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 6, 2022 at 7:36 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On Wed, 6 Apr 2022 at 00:59, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
> >
> > On Tue, Apr 5, 2022 at 7:49 PM David Rowley <dgrowleyml(at)gmail(dot)com>
> wrote:
> >> Yeah, there is more performance to be had than even what you've done
> >> there. There's no reason really for spool_tuples() to do
> >> tuplestore_puttupleslot() when we're not in run mode.
> >
> >
> > Yeah, this is a great idea.
>
> I've attached an updated patch that does most of what you mentioned.
> To make this work I had to add another state to the WindowAggStatus.
> This new state is what the top-level WindowAgg will move into when
> there's a PARTITION BY clause and the run condition becomes false.
> The new state is named WINDOWAGG_PASSTHROUGH_STRICT, which does all
> that WINDOWAGG_PASSTHROUGH does plus skips tuplestoring tuples during
> the spool. We must still spool those tuples when we're not the
> top-level WindowAgg so that we can send those out to any calling
> WindowAgg nodes. They'll need those so they return the correct result.
>
> This means that for intermediate WindowAgg nodes, when the
> runcondition becomes false, we only skip evaluation of WindowFuncs.
> WindowAgg nodes above us cannot reference these, so there's no need to
> evaluate them, plus, if there's a run condition then these tuples will
> be filtered out in the final WindowAgg node.
>
> For the top-level WindowAgg node, when the run condition becomes false
> we can save quite a bit more work. If there's no PARTITION BY clause,
> then we're done. Just return NULL. When there is a PARTITION BY
> clause we move into WINDOWAGG_PASSTHROUGH_STRICT which allows us to
> skip both the evaluation of WindowFuncs and also allows us to consume
> tuples from our outer plan until we get a tuple belonging to another
> partition. No need to tuplestore these tuples as they're being
> filtered out.
>
> Since intermediate WindowAggs cannot filter tuples, all the filtering
> must occur in the top-level WindowAgg. This cannot be done by way of
> the run condition as the run condition is special as when it becomes
> false, we don't check again to see if it's become true. A sort node
> between the WindowAggs can change the tuple order (i.e previously
> monotonic values may no longer be monotonic) so it's only valid to
> evaluate the run condition that's meant for the WindowAgg node it was
> intended for. To filter out the tuples that don't match the run
> condition from intermediate WindowAggs in the top-level WindowAgg,
> what I've done is introduced quals for WindowAgg nodes. This means
> that we can now see Filter in EXPLAIN For WindowAgg and "Rows Removed
> by Filter".
>
> Why didn't I just do the filtering in the outer query like was
> happening before? The problem is that when we push the quals down
> into the subquery, we don't yet have knowledge of which order that the
> WindowAggs will be evaluated in. Only run conditions from
> intermediate WindowAggs will ever make it into the Filter, and we
> don't know which one the top-level WindowAgg will be until later in
> planning. To do the filtering in the outer query we'd need to push
> quals back out the subquery again. It seems to me to be easier and
> better to filter them out lower down in the plan.
>
> Since the top-level WindowAgg node can now filter tuples, the executor
> node had to be given a for(;;) loop so that it goes around again for
> another tuple after it filters a tuple out.
>
> I've also updated the commit message which I think I've made quite
> clear about what we optimise and how it's done.
>
> > And I would suggest the below fastpath for this feature.
> > - if
> (check_and_push_window_quals(subquery, rte, rti, clause))
> > + if (!subquery->hasWindowFuncs ||
> check_and_push_window_quals(subquery, rte, rti, clause))
>
> Good idea. Thanks!
>
> David
>
Hi,

+ * We must keep the original qual in place if there is a
+ * PARTITION BY clause as the top-level WindowAgg remains in
+ * pass-through mode and does nothing to filter out unwanted
+ * tuples.
+ */
+ *keep_original = false;

The comment talks about keeping original qual but the assignment uses the
value false.
Maybe the comment can be rephrased so that it matches the assignment.

Cheers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2022-04-07 04:05:19 Re: Speed up transaction completion faster after many relations are accessed in a transaction
Previous Message David Rowley 2022-04-07 03:25:06 Re: How about a psql backslash command to show GUCs?