Re: [HACKERS] Partition-wise aggregation/grouping

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Partition-wise aggregation/grouping
Date: 2018-03-06 10:31:35
Message-ID: CAM2+6=Wd3f6PKjNKgYxaY+NPvtuhk5_oRZhqLOp7-N4PG8mPHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 6, 2018 at 2:29 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Mon, Mar 5, 2018 at 3:56 AM, Jeevan Chalke
> <jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:
> > However, to perform Gather or Gather Merge once we have all partial paths
> > ready, and to avoid too many existing code rearrangement, I am calling
> > try_partitionwise_grouping() before we do any aggregation/grouping on
> whole
> > relation. By doing this, we will be having all partial paths in
> > partially_grouped_rel and then existing code will do required
> finalization
> > along with any Gather or Gather Merge, if required.
> >
> > Please have a look over attached patch-set and let me know if it needs
> > further changes.
>
> This does look better.
>

Thank you, Robert.

>
> + <term><varname>enable_partitionwise_agg</varname>
> (<type>boolean</type>)
>
> Please don't abbreviate "aggregate" to "agg".
>

This is in-lined with enable_hashagg GUC. Do you think
enable_partitionwise_aggregate
seems better? But it will be not consistent with other GUC names like
enable_hashagg then.

>
> - /* Build final grouping paths */
> - add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
> -
> partially_grouped_rel, agg_costs,
> -
> &agg_final_costs, gd, can_sort, can_hash,
> - dNumGroups,
> (List *) parse->havingQual);
> + if (isPartialAgg)
> + {
> + Assert(agg_partial_costs && agg_final_costs);
> + add_paths_to_partial_grouping_rel(root, input_rel,
> +
> partially_grouped_rel,
> +
> agg_partial_costs,
> +
> gd, can_sort, can_hash,
> +
> false, true);
> + }
> + else
> + {
> + double dNumGroups;
> +
> + /* Estimate number of groups. */
> + dNumGroups = get_number_of_groups(root,
> +
> cheapest_path->rows,
> +
> gd,
> +
> child_data ? make_tlist_from_pathtarget(target) :
> parse->targetList);
> +
> + /* Build final grouping paths */
> + add_paths_to_grouping_rel(root, input_rel, grouped_rel,
> target,
> +
> partially_grouped_rel, agg_costs,
> +
> agg_final_costs, gd, can_sort, can_hash,
> +
> dNumGroups, (List *) havingQual);
> + }
>
> This looks strange. Why do we go down two completely different code
> paths here?

It is because when isPartialAgg = true we need to create partially
aggregated non-partial paths which should be added in
partially_grouped_rel->pathlist. And when isPartialAgg = false, we are
creating fully aggregated paths which goes into grouped_rel->pathlist.

> It seems to me that the set of paths we add to the
> partial_pathlist shouldn't depend at all on isPartialAgg. I might be
> confused, but it seems to me that any aggregate path we construct that
> is going to run in parallel must necessarily be partial, because even
> if each group will occur only in one partition, it might still occur
> in multiple workers for that partition, so finalization would be
> needed.

Thats's true. We are creating partially aggregated partial paths for this
and keeps them in partially_grouped_rel->partial_pathlist.

> On the other hand, for non-partial paths, we can add then to
> partially_grouped_rel when isPartialAgg = true and to grouped_rel when
> isPartialAgg = false, with the only difference being AGGSPLIT_SIMPLE
> vs. AGGSPLIT_INITIAL_SERIAL.

Yes. As explained above, they goes in pathlist of respective Rel.
However, PathTarget is different too, we need partial_pathtarget when
isPartialAgg = true and also need agg_partial_costs.

> But that doesn't appear to be what this
> is doing.
>

So the code for doing partially aggregated partial paths and partially
aggregated non-partial path is same except partial paths goes into
partial_pathlist where as non-partial goes into pathlist of
partially_grouped_rel. Thus, calling add_paths_to_partial_grouping_rel()
when isPartialAgg = true seems correct. Also as we have decided, this
function is responsible to create all partially aggregated paths including
both partial and non-partial.

Am I missing something?

>
> + /*
> + * If there are any fully aggregated partial paths present,
> may be because
> + * of parallel Append over partitionwise aggregates, we must stick
> a
> + * Gather or Gather Merge path atop the cheapest partial path.
> + */
> + if (grouped_rel->partial_pathlist)
>
> This comment is copied from someplace where the code does what the
> comment says, but here it doesn't do any such thing.
>

Well, these comments are not present anywhere else than this place. With
Parallel Append and Partitionwise aggregates, it is now possible to have
fully aggregated partial paths now. And thus we need to stick a Gather
and/or Gather Merge atop cheapest partial path. And I believe the code does
the same. Am I missing something?

>
> More tomorrow...
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anton Dignös 2018-03-06 10:41:58 Re: IndexJoin memory problem using spgist and boxes
Previous Message Amit Langote 2018-03-06 10:20:10 Re: constraint exclusion and nulls in IN (..) clause