Re: [HACKERS] Partition-wise aggregation/grouping

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Partition-wise aggregation/grouping
Date: 2018-03-06 20:15:41
Message-ID: CA+TgmoYp5+_VQvt-VTJETTF3PkxnTNJm5PHAdiTtpox8vgqvjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 6, 2018 at 5:31 AM, Jeevan Chalke
<jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:
> This is in-lined with enable_hashagg GUC. Do you think
> enable_partitionwise_aggregate seems better? But it will be not consistent
> with other GUC names like enable_hashagg then.

Well, if I had my way, enable_hashagg would be spelled
enable_hash_aggregate, too, but I wasn't involved in the project that
long ago. 100% consistency is hard to achieve here; the perfect
parallel of enable_hashagg would be enable_partitionwiseagg, but then
it would be inconsistent with enable_partitionwise_join unless we
renamed it to enable_partitionwisejoin, which I would rather not do.
I think the way the enable_blahfoo names were done was kinda
shortsighted -- it works OK as long as blahfoo is pretty short, like
mergejoin or hashagg or whatever, but if you have more or longer words
then I think it's hard to see where the word boundaries are without
any punctuation. And if you start abbreviating then you end up with
things like enable_pwagg which are not very easy to understand. So I
favor spelling everything out and punctuating it.

> So the code for doing partially aggregated partial paths and partially
> aggregated non-partial path is same except partial paths goes into
> partial_pathlist where as non-partial goes into pathlist of
> partially_grouped_rel. Thus, calling add_paths_to_partial_grouping_rel()
> when isPartialAgg = true seems correct. Also as we have decided, this
> function is responsible to create all partially aggregated paths including
> both partial and non-partial.
>
> Am I missing something?

Hmm. I guess not. I think I didn't read this code well enough
previously. Please find attached proposed incremental patches (0001
and 0002) which hopefully make the code in this area a bit clearer.

>> + /*
>> + * If there are any fully aggregated partial paths present,
>> may be because
>> + * of parallel Append over partitionwise aggregates, we must stick
>> a
>> + * Gather or Gather Merge path atop the cheapest partial path.
>> + */
>> + if (grouped_rel->partial_pathlist)
>>
>> This comment is copied from someplace where the code does what the
>> comment says, but here it doesn't do any such thing.
>
> Well, these comments are not present anywhere else than this place. With
> Parallel Append and Partitionwise aggregates, it is now possible to have
> fully aggregated partial paths now. And thus we need to stick a Gather
> and/or Gather Merge atop cheapest partial path. And I believe the code does
> the same. Am I missing something?

I misread the code. Sigh. I should have waited until today to send
that email and taken time to study it more carefully. But I still
don't think it's completely correct. It will not consider using a
pre-sorted path; the only strategies it can consider are cheapest path
+ Gather and cheapest path + explicit Sort (even if the cheapest path
is already correctly sorted!) + Gather Merge. It should really do
something similar to what add_paths_to_partial_grouping_rel() already
does: first call generate_gather_paths() and then, if the cheapest
partial path is not already correctly sorted, also try an explicit
Sort + Gather Merge. In fact, it looks like we can actually reuse
that logic exactly. See attached 0003 incremental patch; this changes
the outputs of one of your regression tests, but the new plan looks
better.

Some other notes:

There's a difference between performing partial aggregation in the
same process and performing it in a different process. hasNonPartial
tells us that we can't perform partial aggregation *at all*;
hasNonSerial only tells us that partial and final aggregation must
happen in the same process. This patch could possibly take advantage
of partial aggregation even when hasNonSerial is set. Finalize
Aggregate -> Append -> N copies of { Partial Aggregate -> Whatever }
is OK with hasNonSerial = true as long as hasNonPartial = false. Now,
the bad news is that for this to actually work we'd need to define new
values of AggSplit, like AGGSPLIT_INITIAL = AGGSPLITOP_SKIPFINAL and
AGGSPLIT_FINAL = AGGSPLITOP_COMBINE, and I'm not sure how much
complexity that adds. However, if we're not going to do that, I think
we'd better at last add some comments about it suggesting that someone
might want to do something about it in the future.

I think that, in general, it's a good idea to keep the number of times
that create_grouping_paths() does something which is conditional on
whether child_data is NULL to a minimum. I haven't looked at what
Ashutosh tried to do there so I don't know whether it's good or bad,
but I like the idea, if we can do it cleanly.

It strikes me that we might want to consider refactoring things so
that create_grouping_paths() takes the grouping_rel and
partial_grouping_rel as input arguments. Right now, the
initialization of the child grouping and partial-grouping rels is
partly in try_partitionwise_aggregate(), which considers marking one
of them (but never both?) as dummy rels and create_grouping_paths()
which sets reloptkind, serverid, userid, etc. The logic of all of
this is a little unclear to me. Presumably, if the input rel is
dummy, then both the grouping_rel and the partial_grouping_rel are
also dummy. Also, presumably we should set the reloptkind correctly
as soon as we create the rel, not at some later stage.

Or maybe what we should do is split create_grouping_paths() into two
functions. Like this:

if (child_data)
{
partial_grouping_target = child_data->partialRelTarget;
partially_grouped_rel->reltarget = partial_grouping_target;
agg_partial_costs = child_data->agg_partial_costs;
agg_final_costs = child_data->agg_final_costs;
}

--- SPLIT IT HERE ---

/* Apply partitionwise aggregation technique, if possible. */
try_partitionwise_grouping(root, input_rel, grouped_rel,
partially_grouped_rel, target,
partial_grouping_target, agg_costs,
agg_partial_costs, agg_final_costs, gd,
can_sort, can_hash, havingQual, isPartialAgg);

It seems to me that everything from that point to the end is doing the
path generation and it's all pretty much the same for the parent and
child cases. But everything before that is either stuff that doesn't
apply to the child case at all (like the degenerate grouping case) or
stuff that should be done once and passed down (like
can_sort/can_hash). The only exception I see is some of the stuff
that sets up the upper rel at the top of the function, but maybe that
logic could be refactored into a separate function as well (like
initialize_grouping_rel). Then, instead of try_partitionwise_join()
actually calling create_grouping_paths(), it would call
initialize_grouping_rel() and then the path-adding function that we
split off from the bottom of the current create_grouping_paths(),
basically skipping all that stuff in the middle that we don't really
want to do in that case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0003-Refactor-code-to-add-Gather-Gather-Merge.patch application/octet-stream 5.5 KB
0002-Rejigger-test-for-grouped_rel-pathlist-NIL.patch application/octet-stream 2.0 KB
0001-Tidy-up-calls-to-add_paths_to_partial_grouping_rel.patch application/octet-stream 2.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-03-06 20:16:01 Re: JIT compiling with LLVM v11
Previous Message Peter Eisentraut 2018-03-06 20:13:46 Re: Rewriting the test of pg_upgrade as a TAP test - take two