Re: [HACKERS] Partition-wise aggregation/grouping

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Partition-wise aggregation/grouping
Date: 2018-01-16 08:56:03
Message-ID: CAM2+6=XV6KVhB5jFrxYK+gOvKRUjr8oM1rJrupycNx51RtCw+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 16, 2018 at 3:41 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Thu, Jan 11, 2018 at 6:00 AM, Jeevan Chalke
> <jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:
> > Attached new set of patches adding this. Only patch 0007 (main patch)
> and
> > 0008 (testcase patch) has changed.
> >
> > Please have a look and let me know if I missed any.
>
> I spent a little time studying 0001 and 0002 today, as well as their
> relation with 0007. I find the way that the refactoring has been done
> slightly odd. With 0001 and 0002 applied, we end up with three
> functions for creating aggregate paths: create_partial_agg_path, which
> handles the partial-path case for both sort and hash;
> create_sort_agg_path, which handles the sort case for non-partial
> paths only; and create_hash_agg_path, which handles the hash case for
> non-partial paths only. This leads to the following code in 0007:
>
> + /* Full grouping paths */
> +
> + if (try_parallel_aggregation)
> + {
> + Assert(extra->agg_partial_costs &&
> extra->agg_final_costs);
> + create_partial_agg_path(root, input_rel,
> grouped_rel, target,
> +
> partial_target, extra->agg_partial_costs,
> +
> extra->agg_final_costs, gd, can_sort,
> +
> can_hash, (List *) extra->havingQual);
> + }
> +
> + if (can_sort)
> + create_sort_agg_path(root, input_rel,
> grouped_rel, target,
> +
> partial_target, agg_costs,
> +
> extra->agg_final_costs, gd, can_hash,
> +
> dNumGroups, (List *) extra->havingQual);
> +
> + if (can_hash)
> + create_hash_agg_path(root, input_rel,
> grouped_rel, target,
> +
> partial_target, agg_costs,
> +
> extra->agg_final_costs, gd, dNumGroups,
> + (List
> *) extra->havingQual);
>
> That looks strange -- you would expect to see either "sort" and "hash"
> cases here, or maybe "partial" and "non-partial", or maybe all four
> combinations, but seeing three things here looks surprising. I think
> the solution is just to create a single function that does both the
> work of create_sort_agg_path and the work of create_hash_agg_path
> instead of having two separate functions.
>

In existing code (in create_grouping_paths()), I see following pattern:
if (try_parallel_aggregation)
if (can_sort)
if (can_hash)
if (can_sort)
if (can_hash)

And thus, I have created three functions to match with existing pattern.

I will make your suggested changes that is merge create_sort_agg_path() and
create_hash_agg_path(). Will name that function as
create_sort_and_hash_agg_paths().

>
> A related thing that is also surprising is that 0007 manages to reuse
> create_partial_agg_path for both the isPartialAgg and non-isPartialAgg
> cases -- in fact, the calls appear to be identical, and could be
> hoisted out of the "if" statement

Yes. We can do that as well and I think it is better too.
I was just trying to preserve the existing pattern. So for PWA I chose:
if partialAgg
if (try_parallel_aggregation)
if (can_sort)
if (can_hash)
if (can_sort)
if (can_hash)
else fullAgg
if (try_parallel_aggregation)
if (can_sort)
if (can_hash)
if (can_sort)
if (can_hash)

But since, if (try_parallel_aggregation) case is exactly same, I will pull
that out of if..else.

> -- but create_sort_agg_path and
> create_hash_agg_path do not get reused. I think you should see
> whether you can define the new combo function that can be used for
> both cases. The logic looks very similar, and I'm wondering why it
> isn't more similar than it is; for instance, create_sort_agg_path
> loops over the input rel's pathlist, but the code for
> isPartialAgg/can_sort seems to consider only the cheapest path. If
> this is correct, it needs a comment explaining it, but I don't see why
> it should be correct.
>

Oops. My mistake. Missed. We should loop over the input rel's pathlist.

Yep. With above change, the logic is very similar except
(1) isPartialAgg/can_sort case creates the partial paths and
(2) finalization step is not needed at this stage.

I think it can be done by passing a flag to create_sort_agg_path() (or new
combo function) and making appropriate adjustments. Do you think addition of
this new flag should go in re-factoring patch or main PWA patch?
I think re-factoring patch.

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2018-01-16 09:04:51 Re: [HACKERS] Restricting maximum keep segments by repslots
Previous Message REIX, Tony 2018-01-16 08:25:51 RE:[HACKERS] Deadlock in XLogInsert at AIX