Re: [HACKERS] Partition-wise aggregation/grouping

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Partition-wise aggregation/grouping
Date: 2018-03-21 12:01:28
Message-ID: CAM2+6=VSC-TvUxSOJ62UCTAgTWv2JvmyeBafLLzpfvNg=yi_0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 21, 2018 at 2:04 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Tue, Mar 20, 2018 at 10:46 AM, Jeevan Chalke
> <jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:
> > I have added all these three patches in the attached patch-set and
> rebased
> > my changes over it.
> >
> > However, I have not yet made this patch-set dependednt on UPPERREL_TLIST
> > changes you have proposed on another mail-thread and thus it has 0001
> patch
> > refactoring the scanjoin issue.
> > 0002, 0003 and 0004 are your patches added in this patchset.
> > 0005 and 0006 are further refactoring patches. 0006 adds a
> > GroupPathExtraData which stores mostly child variant data and costs.
> > 0007 is main partitionwise aggregation patch which is then rebased
> > accordingly.
> > 0008 contains testcase and 0009 contains FDW changes.
>
> Committed my refactoring patches (your 0002-0004).
>

Thanks Robert.

>
> Regarding apply_scanjoin_target_to_paths in 0001 and 0007, it seems
> like what happens is: we first build an Append path for the topmost
> scan/join rel. That uses paths from the individual relations that
> don't necessarily produce the final scan/join target. Then we mutate
> those relations in place during partition-wise aggregate so that they
> now do produce the final scan/join target and generate some more paths
> using the results. So there's an ordering dependency, and the same
> pathlist represents different things at different times. That is, I
> suppose, not technically any worse than what we're doing for the
> scan/join rel's pathlist in general, but here there's the additional
> complexity that the paths get used both before and after being
> mutated. The UPPERREL_TLIST proposal would clean this up, although I
> realize that has unresolved issues.
>
> In create_partial_grouping_paths, the loop that does "for (i = 0; i <
> 2; i++)" is not exactly what I had in mind when I said that we should
> use two loops. I did not mean a loop with two iterations. I meant
> adding a loop like foreach(lc, input_rel->pathlist) in each place
> where we currently have a loop like
> foreach(input_rel->partial_pathlist).

The path creation logic for partial_pathlist and pathlist was identical and
thus I thought of just loop over it twice switching the pathlist, so that
we have minimal code to maintain. But yes I agree that it adds additional
complexity.

See 0001, attached.
>

Looks great. Thanks.

>
> Don't write if (a) Assert(b) but rather Assert(!a || b). See 0002,
> attached.
>

OK. Noted.

> In the patch as proposed, create_partial_grouping_paths() can get
> called even if GROUPING_CAN_PARTIAL_AGG is not set. I think that's
> wrong.

I don't think so. For parallel case, we do check that. And for
partitionwise aggregation check, it was checked inside
can_partitionwise_grouping() function and flags were set accordingly. Am I
missing something?

> If can_partial_agg() isn't accurately determining whether
> partial aggregation is possible,

I think it does accurately determine.
if (!parse->hasAggs && parse->groupClause == NIL)
is only valid for DISTINCT queries which we are anyway not handling here
and for partitionwise aggregate it won't be true otherwise it will be a
degenerate grouping case.

> and as Ashutosh and I have been
> discussing, there's room for improvement in that area, then that's a
> topic for some other set of patches. Also, the test in
> create_ordinary_grouping_paths for whether or not to call
> create_partial_grouping_paths() is super-complicated and uncommented.
> I think a simpler approach is to allow create_partial_grouping_paths()
> the option of returning NULL. See 0003, attached.
>

Thanks for simplifying it.

However, after this simplification, we were unnecessary creating
non-parallel partial aggregation paths for the root input rel when not
needed.
Consider a case where we need a partial aggregation from a child, in this
case, extra->is_partial_aggregation = 0 at root level entry as the parent
is still doing full aggregation but
perform_partial_partitionwise_aggregation is true, which tells a child to
perform partial partitionwise aggregation. In this case,
cheapest_total_path will be set and thus we will go ahead and create
partial aggregate paths for the parent rel, which is not needed.

I have tweaked these conditions and posted in a separate patch (0006).
However, I have merged all your three patches in one (0005).

>
> make_grouping_rel() claims that "For now, all aggregated paths are
> added to the (GROUP_AGG, NULL) upperrel", but this is false: we no
> longer have only one grouped upper rel.
>

Done.

>
> I'm having a heck of a time understanding what is_partial_aggregation
> and perform_partial_partitionwise_aggregation are supposed to be
> doing.

As you said correctly, is_partial_aggregation denotes that we are doing
ONLY a partial aggregation at this level of partitioning hierarchy whereas
perform_partial_partitionwise_aggregation is used to instruct the child
whether it should perform partial or full aggregation at its level. Since
we need to create a partially_grouped_rel we evaluate all these
possibilities and thus need to pass those to the child so that child will
not need to compute it again.

It seems like is_partial_aggregation means that we should ONLY
> do partial aggregation, which is not exactly what the name implies.
>

I think it says we are doing a partial aggregation and thus implies to use
partially_grouped_rel and skip finalization step.

It also seems like perform_partial_partitionwise_aggregation and
> is_partial_aggregation differ only in that they apply to the current
> level and to the child level respectively;

It's the other way. is_partial_aggregation is applied to the current level
and perform_partial_partitionwise_aggregation applies to child level.

can't we merge these
> somehow so that we don't need both of them?
>

I think we can't as they apply at two different levels. A scenario in which
we are doing full aggregation at level 1 and need to perform partial
aggregation at level 2, they are different. But yes, they both will be same
if both the levels are doing same. But can't merge those.

Do you think any better names as it seems confusing?

> I think that if the last test in can_partitionwise_grouping were moved
> before the previous test, it could be simplified to test only
> (extra->flags & GROUPING_CAN_PARTIAL_AGG) == 0 and not
> *perform_partial_partitionwise_aggregation.
>

I think we can't do this way. If *perform_partial_partitionwise_aggregation
found to be true then only we need to check whether partial aggregation
itself is possible or not. If we are going to perform a full partitionwise
aggregation then test for can_partial_agg is not needed. Have I misread
your comments?

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachment Content-Type Size
partition-wise-agg-v22.tar.gz application/gzip 32.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2018-03-21 12:12:45 pg_basebackup: Missing newlines in some error messages
Previous Message Peter Eisentraut 2018-03-21 11:48:02 Re: [HACKERS] taking stdbool.h into use