Re: [HACKERS] Partition-wise aggregation/grouping

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Partition-wise aggregation/grouping
Date: 2018-03-15 18:46:25
Message-ID: CA+TgmobAw6V6iojub+Ajs_4=dKakZouXr4i=vjM2gd-obmF30Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 15, 2018 at 11:49 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I'm going to go spend some time looking at 0005 next. It looks to me
> like it's generally going in a very promising direction, but I need to
> study the details more.

On further study this patch is doing a number of things, some of which
seem like better ideas than others. Splitting out the degenerate
grouping case looks like a great idea. I've committed a patch to do
this loosely based on 0005, but I whacked it around quite a bit. I
think the result is cleaner than what you had.

As far as the stuff in GroupPathExtraData extra is concerned:

- can_hash and can_sort look good; we can precompute them once and
reuse them for every grouping relation. Cool.

- can_partial_agg looks a bit pointless. You're not going to save
many CPU cycles by computing a value that is derived from two Booleans
and storing the result in another Boolean variable.

- partial_costs_set. The comment in compute_group_path_extra_data
doesn't look good. It says "Set partial aggregation costs if we are
going to calculate partial aggregates in make_grouping_rels()", but
what it means is something more like "agg_partial_costs and
agg_final_costs are not valid yet". I also wonder if there's a way
that we can figure out in advance whether we're going to need to do
this and just do it at the appropriate place in the code, as opposed
to doing it lazily. Even if there were rare cases where we did it
unnecessarily I'm not sure that would be any big deal.

- agg_partial_costs and agg_final_costs themselves seem OK.

- consider_parallel is only used in make_grouping_rels and could be
replaced with a local variable. Its comment in relation.h doesn't
make a lot of sense either, as it is not used to double-check
anything.

- The remaining fields vary across the partitioning hierarchy, and it
seems a little strange to store them in this structure alongside the
pre-computed stuff that doesn't change. I'm not quite sure what to do
about that; obviously passing around 15-20 arguments to a function
isn't too desirable either.

I wonder if we could simplify things by copying more information from
the parent grouping rel to the child grouping rels. It seems to me
for example that the way you're computing consider_parallel for the
child relations is kind of pointless. The parallel-safety of the
grouping_target can't vary across children, nor can that of the
havingQual; the only thing that can change is whether the input rel
has consider_parallel set. You could cater to that by having
GroupPathExtraData do something like extra.grouping_is_parallel_safe =
target_parallel_safe && is_parallel_safe(root, havingQual) and then
set each child's consider parallel flag to
input_rel->consider_parallel && extra.grouping_is_parallel_safe.

Similarly, right now the way the patch sets the reltargets for
grouping rels and partially grouping rels is a bit complex.
make_grouping_rels() calls make_partial_grouping_target() separately
for each partial grouping rel, but for non-partial grouping rels it
gets the translated tlist as an argument. Could we instead consider
always building the tlist by translation from the parent, that is, a
child grouped rel's tlist is the translation of the parent
grouped_rel's tlist, and the child partially grouped rel's tlist is
the translation of the parent partially_grouped_rel's tlist? If you
could both make that work and find a different place to compute the
partial agg costs, make_grouping_rels() would get a lot simpler or
perhaps go away entirely.

I don't like this condition which appears in that function:

if (extra->try_parallel_aggregation || force_partial_agg ||
(extra->partitionwise_grouping &&
extra->partial_partitionwise_grouping))

The problem with that is that it's got to exactly match the criteria
for whether we're going to need the partial_grouping_rel. If it's
true when we are not using partial paths, then you've missed an
optimization; in the reverse case, we'll probably crash or fail to
consider paths we should have considered. It is not entirely
straightforward to verify that this test is correct.
add_paths_to_partial_grouping_rel() gets called if
extra->try_parallel_aggregation is true or if
extra->is_partial_aggregation is true, but the condition doesn't test
extra->is_partial_aggregation at all. The other way that we can end up
using partially_grouped_rel is if create_partitionwise_grouping_paths
is called, but it just silently fails to do anything if we have no
partially_grouped_rel. Putting all that together, do the conditions
under which a partially_grouped_rel gets created match the conditions
under which we want to have one? Beats me! Moreover, even if it's
correct now, I think that the chances that the next person who
modifies this code will manage to keep it correct are not great. I
think we need to create the partial grouping rel somewhere in the code
that's closer to where it's actually needed, so that we don't have so
much action at a distance, or at least have a simpler and more
transparent set of tests.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeremy Finzel 2018-03-15 20:19:50 Re: worker_spi.naptime in worker_spi example
Previous Message Andres Freund 2018-03-15 18:34:03 Re: JIT compiling with LLVM v11