Re: [HACKERS] Partition-wise aggregation/grouping

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Partition-wise aggregation/grouping
Date: 2018-03-22 10:15:56
Message-ID: CAM2+6=U9CM1oodmcH_GCvcW4ujrWYWoX6T_aeBzWEgomrULcXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 22, 2018 at 3:26 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Wed, Mar 21, 2018 at 11:33 AM, Jeevan Chalke
> <jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:
> > Let me try to explain this:
> > 1. GROUPING_CAN_PARTITIONWISE_AGG
> > 2. extra->is_partial_aggregation
> > 3. extra->perform_partial_partitionwise_aggregation
>
> Please find attached an incremental patch that attempts to refactor
> this logic into a simpler form. What I've done is merged all three of
> the above Booleans into a single state variable called 'patype', which
> can be one of PARTITIONWISE_AGGREGATE_NONE,
> PARTITIONWISE_AGGREGATE_FULL, and PARTITIONWISE_AGGREGATE_PARTIAL.
> When create_ordinary_grouping_paths() is called, extra.patype is the
> value for the parent relation; that function computes a new value and
> passes it down to create_partitionwise_grouping_paths(), which inserts
> into the new 'extra' structure for the child.
>
> Essentially, in your system, extra->is_partial_aggregation and
> extra->perform_partial_partitionwise_aggregation both corresponded to
> whether or not patype == PARTITIONWISE_AGGREGATE_PARTIAL, but the
> former indicated whether the *parent* was doing partition-wise
> aggregation (and thus we needed to generate only partial paths) while
> the latter indicated whether the *current* relation was doing
> partition-wise aggregation (and thus we needed to force creation of
> partially_grouped_rel). This took me a long time to understand
> because of the way the fields were named; they didn't indicate that
> one was for the parent and one for the current relation. Meanwhile,
> GROUPING_CAN_PARTITIONWISE_AGG indicated whether partition-wise
> aggregate should be tried at all for the current relation; there was
> no analogous indicator for the parent relation because we can't be
> processing a child at all if the parent didn't decide to do
> partition-wise aggregation. So to know what was happening for the
> current relation you had to look at GROUPING_CAN_PARTITIONWISE_AGG +
> extra->perform_partial_partitionwise_aggregation, and to know what was
> happening for the parent relation you just looked at
> extra->is_partial_aggregation. With this proposed refactoring patch,
> there's just one patype value at each level, which at least to me
> seems simpler. I tried to improve the comments somewhat, too.
>

Leeks cleaner now. Thanks for refactoring it.

I have merged these changes and flatten all previuos changes into the main
patch.

>
> You have some long lines that it would be good to break, like this:
>
> child_extra.targetList = (List *) adjust_appendrel_attrs(root,
>
> (Node *) extra->targetList,
>
> nappinfos,
>
> appinfos);
>
> If you put a newline after (List *), the formatting will come out
> nicer -- it will fit within 80 columns. Please go through the patches
> and make these kinds of changes for lines over 80 columns where
> possible.
>

OK. Done.
I was under impression that it is not good to split the first line. What if
in split version, while applying any new patch or in Merge process
something gets inserted between them? That will result into something
unexpected. But I am not sure if that's ever a possibility.

> I guess we'd better move the GROUPING_CAN_* constants to a header
> file, if they're going to be exposed through GroupPathExtraData. That
> can go in some refactoring patch.
>

Yep. Moved that into 0003 where we create GroupPathExtraData.

> Is there a good reason not to use input_rel->relids as the input to
> fetch_upper_rel() in all cases, rather than just at subordinate
> levels?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachment Content-Type Size
partition-wise-agg-v23.tar.gz application/gzip 30.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-03-22 10:22:56 Re: constraint exclusion and nulls in IN (..) clause
Previous Message Dean Rasheed 2018-03-22 09:56:11 Re: MCV lists for highly skewed distributions