Re: Parallel Aggregate

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregate
Date: 2015-10-13 01:14:28
Message-ID: CA+TgmoY=yiy-VXtiGDpV70dp3vwtAMnhkm1BSqisqJB5+gBm-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Oct 11, 2015 at 10:07 PM, Haribabu Kommi
<kommi(dot)haribabu(at)gmail(dot)com> wrote:
> Parallel aggregate is the feature doing the aggregation job parallel
> with the help of Gather and
> partial seq scan nodes. The following is the basic overview of the
> parallel aggregate changes.
>
> Decision phase:
>
> Based on the following conditions, the parallel aggregate plan is generated.
>
> - check whether the below plan node is Gather + partial seq scan only.
>
> This is because to check whether the plan nodes that are present are
> aware of parallelism or not?

This is really not the right way of doing this. We should do
something more general. Most likely, parallel aggregate should wait
for Tom's work refactoring the upper planner to use paths. But either
way, it's not a good idea to limit ourselves to parallel aggregation
only in the case where there is exactly one base table.

One of the things I want to do pretty early on, perhaps in time for
9.6, is create a general notion of partial paths. A Partial Seq Scan
node creates a partial path. A Gather node turns a partial path into
a complete path. A join between a partial path and a complete path
creates a new partial path. This concept lets us consider,
essentially, pushing joins below Gather nodes. That's quite powerful
and could make Partial Seq Scan applicable to a much broader variety
of use cases. If there are worthwhile partial paths for the final
joinrel, and aggregation of that joinrel is needed, we can consider
parallel aggregation using that partial path as an alternative to
sticking a Gather node on there and then aggregating.

> - Set the single_copy mode as true, in case if the below node of
> Gather is a parallel aggregate.

That sounds wrong. Single-copy mode is for when we need to be certain
of running exactly one copy of the plan. If you're trying to have
several workers aggregate in parallel, that's exactly what you don't
want.

Also, I think the path for parallel aggregation should probably be
something like FinalizeAgg -> Gather -> PartialAgg -> some partial
path here. I'm not clear whether that is what you are thinking or
not.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-10-13 01:14:41 Re: [COMMITTERS] pgsql: Cause TestLib.pm to define $windows_os in all branches.
Previous Message Tom Lane 2015-10-13 00:07:37 Re: Re: [BUGS] BUG #13611: test_postmaster_connection failed (Windows, listen_addresses = '0.0.0.0' or '::')