Re: [HACKERS] Partition-wise aggregation/grouping

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Partition-wise aggregation/grouping
Date: 2018-01-02 07:06:36
Message-ID: CAM2+6=Uqc1FSaGcD7WsLmMmpvY8WirwUu0eNXjOzABsG2orXMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached patchset with all the review comments reported so far.

On Fri, Dec 1, 2017 at 6:11 PM, Ashutosh Bapat <
ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:

> Continuing with review of 0007.
>
> +
> + /* Copy input rels's relids to grouped rel */
> + grouped_rel->relids = input_rel->relids;
>
> Isn't this done in fetch_upper_rel()? Why do we need it here?
> There's also a similar hunk in create_grouping_paths() which doesn't look
> appropriate. I guess, you need relids in grouped_rel->relids for FDW.
> There are
> two ways to do this: 1. set grouped_rel->relids for parent upper rel as
> well,
> but then we should pass relids to fetch_upper_rel() instead of setting
> those
> later. 2. For a parent upper rel, in create_foreignscan_plan(), set relids
> to
> all_baserels, if upper_rel->relids is NULL and don't set relids for a
> parent
> upper rel. I am fine with either of those.
>

Done. Opted second option.

>
> + /* partial phase */
> + get_agg_clause_costs(root, (Node *) partial_target->exprs,
> + AGGSPLIT_INITIAL_SERIAL,
> + &agg_partial_costs);
>
> IIUC, the costs for evaluating aggregates would not change per child. They
> won't be different for parent and any of the children. So, we should be
> able to
> calculate those once, save in "extra" and use repeatedly.
>

Yep. Done.

>
> + if (can_sort)
> + {
> + Path *path = cheapest_path;
> +
> + if (!(pathkeys_contained_in(root->group_pathkeys,
> + path->pathkeys)))
> [ .. clipped patch .. ]
> + NIL,
> + dNumGroups));
> + }
>
> We create two kinds of paths partial paths for parallel query and partial
> aggregation paths when group keys do not have partition keys. The comments
> and
> code uses partial to mean both the things, which is rather confusing. May
> be we
> should use term "partial aggregation" explicitly wherever it means that in
> comments and in variable names.
>

Agree. Used "partial aggregation" wherever applicable. Let me know if you
see any other place need this adjustments.

> I still feel that create_grouping_paths() and create_child_grouping_paths()
> have a lot of common code. While I can see that there are some pockets in
> create_grouping_paths() which are not required in
> create_child_grouping_paths()
> and vice-versa, may be we should create only one function and move those
> pockets under "if (child)" or "if (parent)" kind of conditions. It will be
> a
> maintenance burden to keep these two functions in sync in future. If we do
> not
> keep them in sync, that will introduce bugs.
>

Agree that keeping these two functions in sync in future will be a
maintenance burden, but I am not yet sure how to refactor them cleanly.
Will give one more try and update those changes in the next patchset.

> +
> +/*
> + * create_partition_agg_paths
> + *
> + * Creates append path for all the partitions and adds into the grouped
> rel.
>
> I think you want to write "Creates an append path containing paths from
> all the
> child grouped rels and adds into the given parent grouped rel".
>

Reworded as you said.

>
> + * For partial mode we need to add a finalize agg node over append path
> before
> + * adding a path to grouped rel.
> + */
> +void
> +create_partition_agg_paths(PlannerInfo *root,
> + RelOptInfo *grouped_rel,
> + List *subpaths,
> + List *partial_subpaths,
>
> Why do we have these two as separate arguments? I don't see any call to
> create_partition_agg_paths() through add_append_path() passing both of them
> non-NULL simultaneously. May be you want use a single list subpaths and
> another
> boolean indicating whether it's list of partial paths or regular paths.
>
>
After redesigning in the area of putting gather over append, I don't need
to pass all Append subpaths to this function at-all. Append is done by
add_paths_to_append_rel() itself. This function now just adds fanalization
steps as needed.
So, we don't have two lists now. And to know about partial paths, passed a
boolean instead. Please have a look and let me know if I missed any.

> +
> + /* For non-partial path, just create a append path and we are done. */
> This is the kind of confusion, I am talking about above. Here you have
> mentioned "non-partial path" which may mean a regular path but what you
> actually mean by that term is a path representing partial aggregates.
>

> + /*
> + * Partial partition-wise grouping paths. Need to create final
> + * aggregation path over append relation.
> + *
> + * If there are partial subpaths, then we need to add gather path
> before we
> + * append these subpaths.
>
> More confusion here.
>

Hopefully no more confusion in this new version.

> + */
> + if (partial_subpaths)
> + {
> + ListCell *lc;
> +
> + Assert(subpaths == NIL);
> +
> + foreach(lc, partial_subpaths)
> + {
> + Path *path = lfirst(lc);
> + double total_groups = path->rows *
> path->parallel_workers;
> +
> + /* Create gather paths for partial subpaths */
> + Path *gpath = (Path *) create_gather_path(root, grouped_rel,
> path,
> + path->pathtarget,
> NULL,
> + &total_groups);
> +
> + subpaths = lappend(subpaths, gpath);
>
> Using the argument variable is confusing and that's why you need two
> different
> List variables. Instead probably you could have another variable local to
> this
> function to hold the gather subpaths.
>

Done.

>
> AFAIU, the Gather paths that this code creates has its parent set to
> parent grouped
> rel. That's not correct. These partial paths come from children of grouped
> rel
> and each gather is producing rows corresponding to one children of grouped
> rel.
> So gather path's parent should be set to corresponding child and not parent
> grouped rel.
>

Yep.

>
> This code creates plans where there are multiple Gather nodes under an
> Append
> node. AFAIU, the workers assigned to one gather node can be reused until
> that
> Gather node finishes. Having multiple Gather nodes under an Append mean
> that
> every worker will be idle from the time that worker finishes the work till
> the
> last worker finishes the work. That doesn't seem to be optimal use of
> workers.
> The plan that we create with Gather on top of Append seems to be better.
> So, we
> should avoid creating one Gather node per child plans. Have we tried to
> compare
> performance of these two plans?
>

Agree. Having Gather on top of the Append is better. Done that way. It
resolves your previous comment too.

> + if (!IsA(apath, MergeAppendPath) && root->group_pathkeys)
> + {
> + spath = (Path *) create_sort_path(root,
> + grouped_rel,
> + apath,
> + root->group_pathkeys,
> + -1.0);
> + }
>
> The code here assumes that a MergeAppend path will always have pathkeys
> matching group_pathkeys. I believe that's true but probably we should have
> an
> Assert to make it clear and add comments. If that's not true, we will need
> to
> sort the output of MergeAppend OR discard MergeAppend paths which do not
> have
> pathkeys matching group_pathkeys.
>

Oops. Thanks for pointing out that. You are correct.
Added relevant check which checks for required pathkeys present or not.

>
>
> diff --git a/src/tools/pgindent/typedefs.list
> b/src/tools/pgindent/typedefs.list
> index b422050..1941468 100644
> --- a/src/tools/pgindent/typedefs.list
> +++ b/src/tools/pgindent/typedefs.list
> @@ -2345,6 +2345,7 @@ UnlistenStmt
> UnresolvedTup
> UnresolvedTupData
> UpdateStmt
> +UpperPathExtraData
> UpperRelationKind
> UpperUniquePath
> UserAuth
>
> Do we commit this file as part of the feature?
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>

This patchset contains fixes for other review comments too.

Thanks
--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachment Content-Type Size
partition-wise-agg-v9.tar.gz application/x-gzip 35.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeevan Chalke 2018-01-02 07:30:34 Re: [HACKERS] Partition-wise aggregation/grouping
Previous Message Mithun Cy 2018-01-02 06:09:16 Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager