Re: Parallel Aggregate

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregate
Date: 2015-12-03 05:18:35
Message-ID: CAKJS1f9k5Ej57dJ2oCJrht=ZzO8twpQsktO08K4103b3cpQsSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20 October 2015 at 23:23, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
wrote:

> On 13 October 2015 at 20:57, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
> wrote:
>
>> On Tue, Oct 13, 2015 at 5:53 PM, David Rowley
>> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>> > On 13 October 2015 at 17:09, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
>> > wrote:
>> >>
>> >> On Tue, Oct 13, 2015 at 12:14 PM, Robert Haas <robertmhaas(at)gmail(dot)com>
>> >> wrote:
>> >> > Also, I think the path for parallel aggregation should probably be
>> >> > something like FinalizeAgg -> Gather -> PartialAgg -> some partial
>> >> > path here. I'm not clear whether that is what you are thinking or
>> >> > not.
>> >>
>> >> No. I am thinking of the following way.
>> >> Gather->partialagg->some partial path
>> >>
>> >> I want the Gather node to merge the results coming from all workers,
>> >> otherwise
>> >> it may be difficult to merge at parent of gather node. Because in case
>> >> the partial
>> >> group aggregate is under the Gather node, if any of two workers are
>> >> returning
>> >> same group key data, we need to compare them and combine it to make it
>> a
>> >> single group. If we are at Gather node, it is possible that we can
>> >> wait till we get
>> >> slots from all workers. Once all workers returns the slots we can
>> compare
>> >> and merge the necessary slots and return the result. Am I missing
>> >> something?
>> >
>> >
>> > My assumption is the same as Robert's here.
>> > Unless I've misunderstood, it sounds like you're proposing to add logic
>> into
>> > the Gather node to handle final aggregation? That sounds like a
>> modularity
>> > violation of the whole node concept.
>> >
>> > The handling of the final aggregate stage is not all that different
>> from the
>> > initial aggregate stage. The primary difference is just that your
>> calling
>> > the combine function instead of the transition function, and the values
>>
>> Yes, you are correct, till now i am thinking of using transition types as
>> the
>> approach, because of that reason only I proposed it as Gather node to
>> handle
>> the finalize aggregation.
>>
>> > being aggregated are aggregates states rather than the type of the
>> values
>> > which were initially aggregated. The handling of GROUP BY is all the
>> same,
>> > yet you only apply the HAVING clause during final aggregation. This is
>> why I
>> > ended up implementing this in nodeAgg.c instead of inventing some new
>> node
>> > type that's mostly a copy and paste of nodeAgg.c [1]
>>
>> After going through your Partial Aggregation / GROUP BY before JOIN patch,
>> Following is my understanding of parallel aggregate.
>>
>> Finalize [hash] aggregate
>> -> Gather
>> -> Partial [hash] aggregate
>>
>> The data that comes from the Gather node contains the group key and
>> grouping results.
>> Based on these we can generate another hash table in case of hash
>> aggregate at
>> finalize aggregate and return the final results. This approach works
>> for both plain and
>> hash aggregates.
>>
>> For group aggregate support of parallel aggregate, the plan should be
>> as follows.
>>
>> Finalize Group aggregate
>> ->sort
>> -> Gather
>> -> Partial group aggregate
>> ->sort
>>
>> The data that comes from Gather node needs to be sorted again based on
>> the grouping key,
>> merge the data and generates the final grouping result.
>>
>> With this approach, we no need to change anything in Gather node. Is
>> my understanding correct?
>>
>>
> Our understandings are aligned.
>
>
Hi,

I just wanted to cross post here to mark that I've posted an updated patch
for combining aggregate states:
http://www.postgresql.org/message-id/CAKJS1f9wfPKSYt8CG=T271xbyMZjRzWQBjEixiqRF-oLH_u-Zw@mail.gmail.com

I also wanted to check if you've managed to make any progress on Parallel
Aggregation? I'm very interested in this myself and would like to progress
with it, if you're not already doing so.

My current thinking is that most of the remaining changes required for
parallel aggregation, after applying the combine aggregate state patch,
will be in the exact area that Tom will be making changes for the upper
planner path-ification work. I'm not all that certain if we should hold off
for that or not.

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2015-12-03 05:18:50 Re: [PROPOSAL] VACUUM Progress Checker.
Previous Message David Rowley 2015-12-03 05:01:53 Re: Combining Aggregates