Re: [HACKERS] Partition-wise aggregation/grouping

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Partition-wise aggregation/grouping
Date: 2018-03-05 20:59:36
Message-ID: CA+TgmoZSSLx0aYokv8sJdB7ExfrjE7s+Ee3aVgnvVk9me=_KEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 5, 2018 at 3:56 AM, Jeevan Chalke
<jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:
> However, to perform Gather or Gather Merge once we have all partial paths
> ready, and to avoid too many existing code rearrangement, I am calling
> try_partitionwise_grouping() before we do any aggregation/grouping on whole
> relation. By doing this, we will be having all partial paths in
> partially_grouped_rel and then existing code will do required finalization
> along with any Gather or Gather Merge, if required.
>
> Please have a look over attached patch-set and let me know if it needs
> further changes.

This does look better.

+ <term><varname>enable_partitionwise_agg</varname> (<type>boolean</type>)

Please don't abbreviate "aggregate" to "agg".

- /* Build final grouping paths */
- add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
-
partially_grouped_rel, agg_costs,
-
&agg_final_costs, gd, can_sort, can_hash,
- dNumGroups,
(List *) parse->havingQual);
+ if (isPartialAgg)
+ {
+ Assert(agg_partial_costs && agg_final_costs);
+ add_paths_to_partial_grouping_rel(root, input_rel,
+
partially_grouped_rel,
+
agg_partial_costs,
+
gd, can_sort, can_hash,
+
false, true);
+ }
+ else
+ {
+ double dNumGroups;
+
+ /* Estimate number of groups. */
+ dNumGroups = get_number_of_groups(root,
+
cheapest_path->rows,
+
gd,
+
child_data ? make_tlist_from_pathtarget(target) :
parse->targetList);
+
+ /* Build final grouping paths */
+ add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
+
partially_grouped_rel, agg_costs,
+
agg_final_costs, gd, can_sort, can_hash,
+
dNumGroups, (List *) havingQual);
+ }

This looks strange. Why do we go down two completely different code
paths here? It seems to me that the set of paths we add to the
partial_pathlist shouldn't depend at all on isPartialAgg. I might be
confused, but it seems to me that any aggregate path we construct that
is going to run in parallel must necessarily be partial, because even
if each group will occur only in one partition, it might still occur
in multiple workers for that partition, so finalization would be
needed. On the other hand, for non-partial paths, we can add then to
partially_grouped_rel when isPartialAgg = true and to grouped_rel when
isPartialAgg = false, with the only difference being AGGSPLIT_SIMPLE
vs. AGGSPLIT_INITIAL_SERIAL. But that doesn't appear to be what this
is doing.

+ /*
+ * If there are any fully aggregated partial paths present,
may be because
+ * of parallel Append over partitionwise aggregates, we must stick a
+ * Gather or Gather Merge path atop the cheapest partial path.
+ */
+ if (grouped_rel->partial_pathlist)

This comment is copied from someplace where the code does what the
comment says, but here it doesn't do any such thing.

More tomorrow...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2018-03-05 21:19:49 Re: [PATCH] Verify Checksums during Basebackups
Previous Message Tom Lane 2018-03-05 20:52:06 Re: PATCH: psql tab completion for SELECT