Re: Parallel Aggregate

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregate
Date: 2015-10-13 07:20:25
Message-ID: CANP8+jKf12st2hUxC5ZNaAGrjX9E728spdjSyS81WNRmymvhqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13 October 2015 at 02:14, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Sun, Oct 11, 2015 at 10:07 PM, Haribabu Kommi
> <kommi(dot)haribabu(at)gmail(dot)com> wrote:
> > Parallel aggregate is the feature doing the aggregation job parallel
> > with the help of Gather and
> > partial seq scan nodes. The following is the basic overview of the
> > parallel aggregate changes.
> >
> > Decision phase:
> >
> > Based on the following conditions, the parallel aggregate plan is
> generated.
> >
> > - check whether the below plan node is Gather + partial seq scan only.
> >
> > This is because to check whether the plan nodes that are present are
> > aware of parallelism or not?
>
> This is really not the right way of doing this. We should do
> something more general. Most likely, parallel aggregate should wait
> for Tom's work refactoring the upper planner to use paths. But either
> way, it's not a good idea to limit ourselves to parallel aggregation
> only in the case where there is exactly one base table.
>

What we discussed at PgCon was this rough flow of work

* Pathify upper Planner (Tom) WIP
* Aggregation push down (David) Prototype
* Parallel Aggregates

Parallel infrastructure is also required for aggregation, though that
dependency looks further ahead than the above at present.

Parallel aggregates do look like they can make it into 9.6, but there's not
much slack left in the critical path.

> One of the things I want to do pretty early on, perhaps in time for
> 9.6, is create a general notion of partial paths. A Partial Seq Scan
> node creates a partial path. A Gather node turns a partial path into
> a complete path. A join between a partial path and a complete path
> creates a new partial path. This concept lets us consider,
> essentially, pushing joins below Gather nodes. That's quite powerful
> and could make Partial Seq Scan applicable to a much broader variety
> of use cases. If there are worthwhile partial paths for the final
> joinrel, and aggregation of that joinrel is needed, we can consider
> parallel aggregation using that partial path as an alternative to
> sticking a Gather node on there and then aggregating.

Some form of partial plan makes sense. A better word might be "strand".

> > - Set the single_copy mode as true, in case if the below node of
> > Gather is a parallel aggregate.
>
> That sounds wrong. Single-copy mode is for when we need to be certain
> of running exactly one copy of the plan. If you're trying to have
> several workers aggregate in parallel, that's exactly what you don't
> want.
>
> Also, I think the path for parallel aggregation should probably be
> something like FinalizeAgg -> Gather -> PartialAgg -> some partial
> path here. I'm not clear whether that is what you are thinking or
> not.
>

Yes, but not sure of names.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2015-10-13 07:29:09 Re: Getting sorted data from foreign server
Previous Message David Rowley 2015-10-13 06:53:18 Re: Parallel Aggregate