Re: [HACKERS] Partition-wise aggregation/grouping

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Partition-wise aggregation/grouping
Date: 2018-03-21 15:33:23
Message-ID: CAM2+6=Xidt0DBct7gvFhDNYSU+iFMW9EAQehs1Y-bZZHrXb6qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 21, 2018 at 7:46 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Wed, Mar 21, 2018 at 8:01 AM, Jeevan Chalke
> <jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:
> >> In the patch as proposed, create_partial_grouping_paths() can get
> >> called even if GROUPING_CAN_PARTIAL_AGG is not set. I think that's
> >> wrong.
> >
> > I don't think so. For parallel case, we do check that. And for
> partitionwise
> > aggregation check, it was checked inside can_partitionwise_grouping()
> > function and flags were set accordingly. Am I missing something?
>
> Well, one of us is missing something somewhere. If
> GROUPING_CAN_PARTIAL_AGG means that we're allowed to do partial
> grouping, and if create_partial_grouping_paths() is where partial
> grouping happens, then we should only be calling the latter if the
> former is set. I mean, how can it make sense to create
> partially-grouped paths if we're not allowed to do partial grouping?
>

Yes, that's true. If we are not allowed to do partial grouping,
partially_grouped paths should not be created.

However, what I mean is that the partitionwise related checks added, if
evaluates to true it implies that GROUPING_CAN_PARTIAL_AGG is also set as
it was checked earlier. And thus does not need explicit check again.

Anyway, after your refactoring, it becomes more readable now.

>
> > I have tweaked these conditions and posted in a separate patch (0006).
> > However, I have merged all your three patches in one (0005).
>
> OK, thanks. I wasn't sure I had understood what was going on, so
> thanks for checking it.
>
> Thanks also for keeping 0004-0006 separate here, but I think you can
> flatten them into one patch in the next version.
>

OK. Sure.

>
> >> I think that if the last test in can_partitionwise_grouping were moved
> >> before the previous test, it could be simplified to test only
> >> (extra->flags & GROUPING_CAN_PARTIAL_AGG) == 0 and not
> >> *perform_partial_partitionwise_aggregation.
> >
> > I think we can't do this way. If *perform_partial_
> partitionwise_aggregation
> > found to be true then only we need to check whether partial aggregation
> > itself is possible or not. If we are going to perform a full
> partitionwise
> > aggregation then test for can_partial_agg is not needed. Have I misread
> your
> > comments?
>
> It seems you're correct, because when I change it the tests fail. I
> don't yet understand why.
>
> Basically, the main patch seems to use three Boolean signaling mechanisms:
>
> 1. GROUPING_CAN_PARTITIONWISE_AGG
> 2. is_partial_aggregation
> 3. perform_partial_partitionwise_aggregation
>
> Stuff I don't understand:
>
> - Why is one of them a Boolean shoved into "flags", even though it's
> not static across the whole hierarchy like the other flags, and the
> other two are separate Booleans?
> - What do they all do, anyway?
>

Let me try to explain this:

1. GROUPING_CAN_PARTITIONWISE_AGG
Tells us whether or not partitionwise grouping and/or aggregation is ever
possible. If it is FALSE, other two have no meaning and they will be
useless. However, if it is TRUE, then only we attempt to create paths
partitionwise.
I have kept it in "flags" as it looks similar in behavior with other flag
members like can_sort, can_hash etc. And, for given grouped relation
whether parent or child, they all work similarly. But yes, for child
relation, we inherit can_sort/can_hash from the parent as they won't
change. But need to evaluate this for every child.
If required, I can move that to a GroupPathExtraData struct.

2. extra->is_partial_aggregation
This boolean var is used to identify at any given time whether we are
computing a full aggregation or a partial aggregation. This boolean is
necessary when doing partial aggregation to skip finalization. And also
tells us to use partially_grouped_rel when true.

3. extra->perform_partial_partitionwise_aggregation
This boolean var is used to instruct child that it has to create a
partially aggregated paths when TRUE. And then it transferred to
child_extra->is_partial_aggregation in
create_partitionwise_grouping_paths().

Basically (3) is required as we wanted to create a partially_grouped_rel
upfront. So that if the child is going to create a partially aggregated
paths, they can append those into the parent's partially grouped rel and
thus we need to create that before even we enter into the child paths
creation.
Since (3) is only valid if (1) is true, we need to compute (1) upfront too.

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2018-03-21 15:38:03 Re: handling of heap rewrites in logical decoding
Previous Message Tom Lane 2018-03-21 15:33:12 Re: WIP: a way forward on bootstrap data