Re: Partition-wise aggregation/grouping

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise aggregation/grouping
Date: 2017-10-09 12:10:04
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On Thu, Sep 28, 2017 at 3:12 PM, Ashutosh Bapat <
ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:

> Here are comments on 0004 from last patch set. But most the comments
> still apply.

Thank you, Ashutosh for reviewing.

> Patch 0001 adds functions create_hash_agg_path() and
> create_sort_agg_path().
> Patch 0004 adds a new argument to those functions for conditions in HAVING
> clause. We should move those changes to 0001 and pass parse->havingQual to
> these functions in 0001 itself. That will keep all changes to those
> functions
> together and also make 0003 small.


> The prologue of try_partition_wise_grouping() mentions a restriction of
> partition keys being leading group by clauses. This restriction is not
> specified in the prologue of have_grouping_by_partkey(), which actually
> checks
> for this restriction. The requirement per prologue of that function is
> just to
> have partition keys in group clause. I think have_grouping_by_partkey() is
> correct, and we should change prologue of try_partition_wise_grouping() to
> be
> in sync with have_grouping_by_partkey().


The prologue explains why
> partition-wise aggregation/grouping would be efficient with this
> restriction,
> but it doesn't explain why partial aggregation/grouping per partition
> would be
> efficient. May be it should do that as well. Similar is the case with
> partial
> aggregation/grouping discussion in README.

I have tried updating it. Please check.

> + /* Do not handle grouping sets for now. */
> Is this a real restriction or just restriction for first cut of this
> feature?
> Can you please add more explanation? May be update README as well?

Grouping sets plan does not work with an inheritance subtree (see notes in
create_groupingsets_plan). Thus grouping sets are not handled here.

> + grouped_rel->part_scheme = input_rel->part_scheme;
> Is this true even for partial aggregates? I think not. Since group by
> clause
> does not contain partition keys, the rows from multiple partitions
> participate
> in one group and thus the partition keys of input relation do not apply to
> the
> grouped relation. In this case, it seems that the grouped rel will have
> part_rels but will not be partitioned.

I have removed this as your analysis is correct. grouped_rel is not

> + /*
> + * If there is no path for the child relation, then we cannot add
> + * aggregate path too.
> + */
> + if (input_child_rel->pathlist == NIL)
> + return;
> When can this happen? I think, similar to partition-wise join it happens
> when
> there is a dummy parent relation. See [1]. If that's the case, you may
> want to
> do things similar to what partition-wise join is doing. If there's some
> other
> reason for doing this, returing from here half-way is actually waste of
> memory
> and planning time. Instead, we may want to loop over the part_rels to find
> if
> any of those have empty pathlist and return from there before doing any
> actual
> work.

This is kind of can't happen scenario, so I have converted it to an
And yes, I am marking a grouped_rel as dummy rel when input rel is dummy.

> + extra.pathTarget = child_target;
> + extra.inputRows = input_child_rel->cheapest_startup_path->rows;
> + extra.havingQual = (Node *) adjust_appendrel_attrs(root,
> + (Node *)
> query->havingQual,
> + nappinfos,
> + appinfos);
> These lines are updating some fields of "extra" structure in every loop.
> The
> structure is passed to create_child_grouping_paths() in the loop and to
> add_paths_to_append_rel() outside the loop. Thus add_paths_to_append_rel()
> only
> gets some member values for the last child. Is that right?

No. Patch do update those fields before calling add_paths_to_append_rel().

Should we split
> extra into two structures one to be used within the loop and one outside?
> Or
> may be send the members being updated within the loop separately?

I don't see any point in splitting. We need almost all fields at child path
creation as well as at finalization step. The patch basically just re-using
the struct variable.

> + /*
> + * Forcibly apply scan/join target to all the Paths for the
> scan/join
> + * rel.
> + *
> [ lines clipped ]
> + if (subpath == input_child_rel->cheapest_total_path)
> + input_child_rel->cheapest_total_path = path;
> + }
> + }
> This code seems to be copied from grouping_planner() almost verbatim. Is
> there
> a way we can refactor it into a function and use it in both the places.

Moved this in

> have_grouping_by_partkey() may use match_expr_to_partition_keys() to find
> whether a given clause expression matches any of the partition keys. Or you
> could use list_intersection() instead of following loop
> + foreach(lc, partexprs)
> + {
> + Expr *partexpr = lfirst(lc);
> +
> + if (list_member(groupexprs, partexpr))
> + {
> + found = true;
> + break;
> + }
> + }
> + /*
> + * If none of the partition key matches with any of the GROUP BY
> + * expression, return false.
> + */
> + if (!found)
> + return false;

Well, the logic in match_expr_to_partition_keys() does not exactly match
the scenarios here. It may match with few alterations but then it will
complex. So better to have them separate.

list_intersection() is a good suggestion as it will reduce this block
altogether and will have less lines-of-code to maintain. However, it returns
a list of all matching cells from List1 which is done by comparing all
elements. But here in this case we don't need to match further after very
first match. Thus this logic saves on those unnecessary matching.

> create_child_grouping_paths() and create_grouping_paths() has almost
> similar
> code. Is there a way we could refactor the code to extract common code
> into a
> function called by these two functions or reuse create_grouping_paths() for
> children as well? I don't think we will be able to do the later.

After refactoring most of the code in create_grouping_paths() (0001-0003),
it is very little code remained which is duplicated. Refactoring those few
lines into another function looks odd.
Let me know, if you still think to refactor those few lines in a separate

> + /* Nothing to do if there is an empty pathlist */
> + if (grouped_rel->pathlist == NIL)
> + return false;
> When would that happen? Similar condition in case of parent grouped rel
> throws
> an error, so when this code is called, we know for sure that parent had
> non-empty pathlist. So, we would expect child to have non-empty pathlist as
> well.

Yes and agree too. This is kind of not-reachable return.
Do you mean we should also throw an error here like in case of parent
rel? I opted to not throw an error and instead go with the non

> + grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG,
> + input_rel->relids);
> +
> + /* Mark this rel as "other upper rel" */
> + grouped_rel->reloptkind = RELOPT_OTHER_UPPER_REL;
> I think we need to pass relopkind as an argument to fetch_upper_rel(), now
> that
> we have "upper" relations and "other upper" relations. relids will still
> be a
> "key" to find an upper relation but its reloptkind should match the given
> reloptkind. fetch_upper_rel() is used to create the upper relation if it
> doesn't exist. So, with the above code, if some other function calls
> fetch_upper_rel() with given relids, it would get an upper rel with
> RELOPT_UPPER_REL and then this code would change it to
> That looks odd. May be we should have written build_upper_rel() and
> find_upper_rel() similar to build_join_rel() and find_join_rel() instead of
> combining both the functionalities in one function.

Make sense. But I am reluctant to update fetch_upper_rel() and all it's
However, do you think having a separate function for other upper rel for
is a good idea, named fetch_other_upper_rel() in-lined with

> + /*
> + * create_append_path() sets the path target from the given
> relation.
> + * However, in this case grouped_rel doesn't have a target set.
> So we
> + * must set the pathtarget to the passed in target.
> + */
> + apath->pathtarget = target;
> I think, we should change create_append_path() functions to accept target
> as an
> additional argument. For append rels other than aggregate and grouping,
> target
> will be same as relation's target. For agg/group append rels, we will pass
> different targets for partial and non-partial grouping paths.

Done in 0005-Pass-pathtarget-to-create_-merge_-append_path.patch.

> + /*
> + * Since Append's output is always unsorted, we'll need to sort,
> + * unless there's no GROUP BY clause or a degenerate (constant)
> one,
> + * in which case there will only be a single group.
> + */
> append path here can be output of either merge append or append. If it's
> output
> of merge append, we don't need to sort it again, do we?

Yes, you are right, we don't need an explicit sort over merge-append.
Done those changes.

> create_partition_agg_paths() creates append paths and then adds
> finalization
> path if necessary. The code to add finalization path seems to be similar
> to the
> code that adds finalization path for parallel query. May be we could take
> out
> common code into a function and call that function in two places. I see
> this
> function as accepting a partial aggregation/grouping path and returning a
> path
> that finalizes partial aggregates/groups.

It seems that it will become messy. Per my understanding the only common
is related to the add_path() call with appropriate create_agg_path() or
create_group_path(). Those are anyways function calls and I don't see any
reason to split them into the separate function.

Attached new patch set having HEAD at 84ad4b0 with all these review points
fixed. Let me know if I missed any thanks.

I have merged parallelism changes into main patch i.e. 0007 as most of the
changes in that patch are actual modifying same lines added by 0007.


Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachment Content-Type Size
partition-wise-agg-v5.tar.gz application/x-gzip 32.0 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2017-10-09 12:21:21 Re: [POC] hash partitioning
Previous Message amul sul 2017-10-09 11:14:29 Re: [POC] hash partitioning