Re: [HACKERS] Partition-wise aggregation/grouping

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Partition-wise aggregation/grouping
Date: 2018-03-05 08:51:08
Message-ID: CAM2+6=Vwyx+doCm425Oqfw5z=zxYuQ1gvpUzsc+8i2cQH58FkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 2, 2018 at 3:22 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Thu, Mar 1, 2018 at 5:34 AM, Jeevan Chalke
> <jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:
> > Attached new patchset after rebasing my changes over these changes and on
> > latest HEAD.
>
> + * We have already created a Gather or Gather Merge path atop
> cheapest
> + * partial path. Thus the partial path referenced by the
> Gather node needs
> + * to be preserved as adding new partial paths in same rel may
> delete this
> + * referenced path. To do this we need to clear the
> partial_pathlist from
> + * the partially_grouped_rel as we may add partial paths again
> while doing
> + * partitionwise aggregation. Keeping older partial path intact
> seems
> + * reasonable too as it might possible that the final path
> chosen which is
> + * using it wins, but the underneath partial path is not the
> cheapest one.
>
> This isn't a good design. You shouldn't create a Gather or Gather
> Merge node until all partial paths have been added. I mean, the point
> is to put a Gather node on top of the cheapest path, not the path that
> is currently the cheapest but might not actually be the cheapest once
> we've added them all.
>

To be honest, I didn't know that we should not generated Gather or Gather
Merge until we have all possible partial paths in place. I realize it
recently while debugging one issue reported by Rajkumar off-list. While
working on that fix, what I have observed is
- I have cheapest partial path with cost say 10, a Gather on it increased
cost to 11.
- Later when I add a partial path it has a cost say 9 but a gather on it
resulted is total cost to 12.
This means, the first Gather path is the cheapest one but not the
underneath partial path and unfortunately that got removed when my partial
path is added into the partial_pathlist.

Due to this, I thought it is better to have both paths valid and to avoid
deleting earlier cheapest partial_path, I chose to reset the
partially_grouped_rel->partial_pathlist.

But, yes per comment in generate_gather_paths() and as you said, we should
add Gather or Gather Merge only after we have done with all partial path
creation. Sorry for not knowing this before.

>
> +add_gather_or_gather_merge(PlannerInfo *root,
>
> Please stop picking generic function names for functions that have
> very specific purposes. I don't really think that you need this to be
> a separate function at all, but it it is certainly NOT a
> general-purpose function for adding a Gather or Gather Merge node.
>

OK. Got it now.

>
> + /*
> + * Collect statistics about aggregates for estimating costs of
> + * performing aggregation in parallel or in partial, if not
> already
> + * done. We use same cost for all the children as they will be
> same
> + * anyways.
> + */
>
> If it only needs to be done once, do we really have to have it inside
> the loop? I see that you're using the varno-translated
> partial_target->exprs and target->exprs, but if the specific varnos
> don't matter, why not just use the untranslated version of the targets
> before entering the loop? And if the specific varnos do matter, then
> presumably you need to do it every time.
>

Yes. It can be pulled outside a loop.

>
> This is not a full review, but I'm out of time for right now.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeevan Chalke 2018-03-05 08:56:41 Re: [HACKERS] Partition-wise aggregation/grouping
Previous Message Masahiko Sawada 2018-03-05 08:47:06 Re: BUG #14999: pg_rewind corrupts control file global/pg_control