Re: [HACKERS] Partition-wise aggregation/grouping

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Partition-wise aggregation/grouping
Date: 2018-01-31 19:41:37
Message-ID: CA+TgmobrzFYS3+U8a_BCy3-hOvh5UyJbC18rEcYehxhpw5=ETA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 29, 2018 at 3:42 AM, Jeevan Chalke
<jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:
> Attached new patch set and rebased it on latest HEAD.

I strongly dislike add_single_path_to_append_rel. It adds branches
and complexity to code that is already very complex. Most
importantly, why are we adding paths to fields in
OtherUpperPathExtraData *extra instead of adding them to the path list
of some RelOptInfo? If we had an appropriate RelOptInfo to which we
could add the paths, then we could make this simpler.

If I understand correctly, the reason you're doing it this way is
because we have no place to put partially-aggregated, non-partial
paths. If we only needed to worry about the parallel query case, we
could just build an append of partially-aggregated paths for each
child and stick it into the grouped rel's partial pathlist, just as we
already do for regular parallel aggregation. There's no reason why
add_paths_to_grouping_rel() needs to care about the difference a
Partial Aggregate on top of whatever and an Append each branch of
which is a Partial Aggregate on top of whatever. However, this won't
work for non-partial paths, because add_paths_to_grouping_rel() needs
to put paths into the grouped rel's pathlist -- and we can't mix
together partially-aggregated paths and fully-aggregated paths in the
same path list.

But, really, the way we're using grouped_rel->partial_pathlist right
now is an awful hack. What I'm thinking we could do is introduce a
new UpperRelationKind called UPPERREL_PARTIAL_GROUP_AGG, coming just
before UPPERREL_GROUP_AGG. Without partition-wise aggregate,
partially_grouped_rel's pathlist would always be NIL, and its partial
pathlist would be constructed using the logic in
add_partial_paths_to_grouping_rel, which would need renaming. Then,
add_paths_to_grouping_rel would use paths from input_rel when doing
non-parallel aggregation and paths from partially_grouped_rel when
doing parallel aggregation. This would eliminate the ugly
grouped_rel->partial_pathlist = NIL assignment at the bottom of
create_grouping_paths(), because the grouped_rel's partial_pathlist
would never have been (bogusly) populated in the first place, and
hence would not need to be reset. All of these changes could be made
via a preparatory patch.

Then the main patch needs to worry about four cases:

1. Parallel partition-wise aggregate, grouping key doesn't contain
partition key. This should just be a matter of adding additional
Append paths to partially_grouped_rel->partial_pathlist. The existing
code already knows how to stick a Gather and FinalizeAggregate step on
top of that, and I don't see why that logic would need any
modification or addition. An Append of child partial-grouping paths
should be producing the same output as a partial grouping of an
Append, except that the former case might produce more separate groups
that need to be merged; but that should be OK: we can just throw all
the paths into the same path list and let the cheapest one win.

2. Parallel partition-wise aggregate, grouping key contains partition
key. For the most part, this is no different from case #1. We won't
have groups spanning different partitions in this case, but we might
have groups spanning different workers, so we still need a
FinalizeAggregate step. As an exception, Gather -> Parallel Append ->
[non-partial Aggregate path] would give us a way of doing aggregation
in parallel without a separate Finalize step. I'm not sure if we want
to consider that to be in scope for this patch. If we do, then we'd
add the Parallel Append path to grouped_rel->partial_pathlist. Then,
we could stick Gather (Merge) on top if it to produce a path for
grouped_rel->pathlist using generate_gather_paths(); alternatively, it
can be used by upper planning steps -- something we currently can't
ever make work with parallel aggregation.

3. Non-parallel partition-wise aggregate, grouping key contains
partition key. Build Append paths from the children of grouped_rel
and add them to grouped_rel->pathlist.

3. Non-parallel partition-wise aggregate, grouping key doesn't contain
partition key. Build Append paths from the children of
partially_grouped_rel and add them to partially_grouped_rel->pathlist.
Also add code to generate paths for grouped_rel->pathlist by sticking
a FinalizeAggregate step on top of each path from
partially_grouped_rel->pathlist.

Overall, what I'm trying to argue for here is making this feature look
less like its own separate thing and more like part of the general
mechanism we've already got: partial paths would turn into regular
paths via generate_gather_paths(), and partially aggregated paths
would turn into fully aggregated paths by adding FinalizeAggregate.
The existing special case that allows us to build a non-partial, fully
aggregated path from a partial, partially-aggregated path would be
preserved.

I think this would probably eliminate some other problems I see in the
existing design as well. For example, create_partition_agg_paths
doesn't consider using Gather Merge, but that might be a win. With
the design above, I think you never need to call create_gather_path()
anywhere. In case #1, the existing code takes care of it. In the
special case mentioned under #2, if we chose to support that,
generate_gather_paths() would take care of it. Both of those places
already know about Gather Merge.

On another note, I found preferFullAgg to be wicked confusing. To
"prefer" something is to like it better, but be willing to accept
other options if the preference can't be accommodated. Here, it seems
like preferFullAgg = false prevents consideration of full aggregation.
So it's really more like allowFullAgg, or, maybe better,
try_full_aggregation. Also, try_partition_wise_grouping has a
variable isPartialAgg which is always ends up getting set to
!preferFullAgg. Having two Boolean variables which are always set to
the opposite of each other isn't good. To add to the confusion, the
code following the place where isPartialAgg is set sometimes refers to
isPartialAgg and sometimes refers to preferFullAgg.

I think the comments in this patch still need a good bit of work.
They tend to explain what the code does rather than the reason it does
it, and they tend to speak vaguely rather than precisely about things
happening in other places. For example, consider the need to set
partial_pathlist = NIL in create_grouping_paths(). Here's the existing
comment:

/*
* We've been using the partial pathlist for the grouped relation to hold
* partially aggregated paths, but that's actually a little bit bogus
* because it's unsafe for later planning stages -- like ordered_rel ---
* to get the idea that they can use these partial paths as if they didn't
* need a FinalizeAggregate step. Zap the partial pathlist at this stage
* so we don't get confused.
*/

Here's your comment about the same hazzard:

+ * For full aggregation, at this point we are already done with the
+ * finalization step and thus partial paths are no more needed. Keeping
+ * those will lead to some unwanted result later in the planning stage.
+ * Thus like create_grouping_paths(), clear them out.

Notice that your comment says that it will created an "unwanted
result" and that this result will happen "later", whereas the existing
comment is a lot more specific. It says exactly what the problem is
(FinalizeAggregate needed) and where the confusion will happen
(ordered_rel). Some other examples of comments with similar problems:

+ * If there are partial subpaths for parallelism, then we need to add
+ * gather path on top of the append. However, we only do this when full
+ * aggregation is required. For partial aggregation this can be done at
+ * later stage.

Doesn't really explain why we're doing any of those things, just says
that we are. Also, what later stage?

+ * For non-partial aggregation path, just need to add given
append path to
+ * a grouped_rel. Also, if caller requested a partial aggregation only,
+ * skip finalize step.

Again, why?

+ * Add all collected append paths into the grouped_rel. For partial
+ * aggregation mode we need to add a finalize agg node over an append
+ * path.

Why?

+ /* Similarly, for partial paths. Here we need to add gather
node too. */

Why?

+ * However, in partial aggregation mode this is done
at later stage,
+ * so skip it.

When?

Here's an example of a much better comment you wrote:

+ * In find_ec_member_for_tle(), child EC members are ignored
if they don't
+ * belong to the given relids. Thus, if this sort path is
based on a child
+ * relation, we must pass the relids of it. Otherwise, we will
end-up into
+ * an error requiring pathkey item.

I haven't studied this patch in enough depth yet to figure out whether
I think that makes sense, but clearly when I go to do that this
comment is going to be a big help in figuring it out.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-01-31 19:45:46 Re: JIT compiling with LLVM v9.0
Previous Message Andres Freund 2018-01-31 18:37:52 Re: JIT compiling with LLVM v9.0