Re: Parallel Aggregate

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregate
Date: 2016-03-19 00:52:10
Message-ID: CAKJS1f8qKj+uigLciQfkpuXh8TxVPmioQECMvCHUbe4WUVLQwA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 19 March 2016 at 06:15, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Mar 18, 2016 at 9:16 AM, David Rowley
> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>> I've attached an updated patch.
>
> This looks substantially better than earlier versions, although I
> haven't studied every detail of it yet.
>
> + * Partial aggregation requires that each aggregate does not have a DISTINCT or
> + * ORDER BY clause, and that it also has a combine function set. Since partial
>
> I understand why partial aggregation doesn't work if you have an ORDER
> BY clause attached to the aggregate itself, but it's not so obvious to
> me that using DISTINCT should rule it out. I guess we can do it that
> way for now, but it seems aggregate-specific - e.g. AVG() can't cope
> with DISTINCT, but MIN() or MAX() wouldn't care. Maybe MIN() and
> MAX() are the outliers in this regard, but they are a pretty common
> case.

hmm? We'd have no way to ensure that two worker processes aggregated
only values which other workers didn't.
Of course this just happens to be equivalent for MIN() and MAX(), but
today we don't attempt to transform MIN(DISTINCT col) to MIN(col), so
I see no reason at all why this patch should go and add something
along those lines. Perhaps it's something for the future though,
although it's certainly not anything specific to parallel aggregate.

> + * An Aggref can operate in one of two modes. Normally an aggregate function's
> + * value is calculated with a single executor Agg node, however there are
> + * times, such as parallel aggregation when we want to calculate the aggregate
>
> I think you should adjust the punctuation to "with a single executor
> Agg node; however, there are". And maybe drop the word "executor".
>
> And on the next line, I'd add a comma: "such as parallel aggregation,
> when we want".

Fixed. Although I've revised that block a bit after getting rid of aggpartial.

>
> astate->xprstate.evalfunc =
> (ExprStateEvalFunc) ExecEvalAggref;
> - if (parent && IsA(parent, AggState))
> + if (!aggstate || !IsA(aggstate, AggState))
> {
> - AggState *aggstate =
> (AggState *) parent;
> -
> - aggstate->aggs = lcons(astate,
> aggstate->aggs);
> - aggstate->numaggs++;
> + /* planner messed up */
> + elog(ERROR, "Aggref found in
> non-Agg plan node");
> }
> - else
> + if (aggref->aggpartial ==
> aggstate->finalizeAggs)
> {
> /* planner messed up */
> - elog(ERROR, "Aggref found in
> non-Agg plan node");
> + if (aggref->aggpartial)
> + elog(ERROR, "Partial
> type Aggref found in FinalizeAgg plan node");
> + else
> + elog(ERROR,
> "Non-Partial type Aggref found in Non-FinalizeAgg plan node");
> }
> + aggstate->aggs = lcons(astate, aggstate->aggs);
> + aggstate->numaggs++;
>
> This seems like it involves more code rearrangement than is really
> necessary here.

This is mostly gone, as after removing aggpartial some of these checks
are not possible. I just have some additional code:

Aggref *aggref = (Aggref *) node;

if (aggstate->finalizeAggs &&
aggref->aggoutputtype != aggref->aggtype)
{
/* planner messed up */
elog(ERROR, "Aggref aggoutputtype must match aggtype");
}

But nothing to sanity check non-finalize nodes.

>
> + * Partial paths in the input rel could allow us to perform
> aggregation in
> + * parallel, set_grouped_rel_consider_parallel() will determine if it's
> + * going to be safe to do so.
>
> Change comma to semicolon or period.

Changed.

> /*
> * Generate a HashAgg Path atop of the cheapest partial path, once
> * again, we'll only do this if it looks as though the hash table won't
> * exceed work_mem.
> */
>
> Same here. Commas are not the way to connect two independent sentences.

Changed.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2016-03-19 01:12:12 Re: Weighted Stats
Previous Message Tomas Vondra 2016-03-19 00:38:30 Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system