Re: Parallel Aggregate

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregate
Date: 2015-10-13 06:53:18
Message-ID: CAKJS1f9BqhMRQO0AUbVmmduoOunH0_azqT77G2BzX5azG=QPNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13 October 2015 at 17:09, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
wrote:

> On Tue, Oct 13, 2015 at 12:14 PM, Robert Haas <robertmhaas(at)gmail(dot)com>
> wrote:
> > Also, I think the path for parallel aggregation should probably be
> > something like FinalizeAgg -> Gather -> PartialAgg -> some partial
> > path here. I'm not clear whether that is what you are thinking or
> > not.
>
> No. I am thinking of the following way.
> Gather->partialagg->some partial path
>
> I want the Gather node to merge the results coming from all workers,
> otherwise
> it may be difficult to merge at parent of gather node. Because in case
> the partial
> group aggregate is under the Gather node, if any of two workers are
> returning
> same group key data, we need to compare them and combine it to make it a
> single group. If we are at Gather node, it is possible that we can
> wait till we get
> slots from all workers. Once all workers returns the slots we can compare
> and merge the necessary slots and return the result. Am I missing
> something?
>

My assumption is the same as Robert's here.
Unless I've misunderstood, it sounds like you're proposing to add logic
into the Gather node to handle final aggregation? That sounds like
a modularity violation of the whole node concept.

The handling of the final aggregate stage is not all that different from
the initial aggregate stage. The primary difference is just that your
calling the combine function instead of the transition function, and the
values being aggregated are aggregates states rather than the type of the
values which were initially aggregated. The handling of GROUP BY is all the
same, yet you only apply the HAVING clause during final aggregation. This
is why I ended up implementing this in nodeAgg.c instead of inventing some
new node type that's mostly a copy and paste of nodeAgg.c [1]

If you're performing a hash aggregate you need to wait until all the
partially aggregated groups are received anyway. If you're doing a sort/agg
then you'll need to sort again after the Gather node.

[1]
http://www.postgresql.org/message-id/CAKJS1f9kw95K2pnCKAoPmNw==7fgjSjC-82cy1RB+-x-Jz0QHA@mail.gmail.com

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2015-10-13 07:20:25 Re: Parallel Aggregate
Previous Message Amit Kapila 2015-10-13 06:45:35 Re: Parallel Seq Scan