Re: WIP: Aggregation push-down - take2

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)mitsubishielectric(dot)co(dot)jp>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "david(at)pgmasters(dot)net" <david(at)pgmasters(dot)net>, "tgl(at)sss(dot)pgh(dot)pa(dot)us" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "zhihui(dot)fan1213(at)gmail(dot)com" <zhihui(dot)fan1213(at)gmail(dot)com>, "legrand_legrand(at)hotmail(dot)com" <legrand_legrand(at)hotmail(dot)com>, "daniel(at)yesql(dot)se" <daniel(at)yesql(dot)se>
Subject: Re: WIP: Aggregation push-down - take2
Date: 2023-01-04 10:12:21
Message-ID: CALDaNm2KsNYEQ-T2TDPEiicHRVbbz8JinSzm6hXGvDswpfkqfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 17 Nov 2022 at 16:34, Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> > Hi,
> >
> > I did a quick initial review of the v20 patch series. I plan to do a
> > more thorough review over the next couple days, if time permits. In
> > general I think the patch is in pretty good shape.
>
> Thanks.
>
> > I've added a bunch of comments in a number of places - see the "review
> > comments" parts for each of the original parts. That should make it
> > easier to deal with all the items. I'll go through the main stuff here:
>
> Unless I miss something, all these items are covered in context below, except
> for this one:
>
> > 7) when I change enable_agg_pushdown to true and run regression tests, I
> > get a bunch of failures like
> >
> > ERROR: WindowFunc found where not expected
> >
> > Seems we don't handle window functions correctly somewhere, or maybe
> > setup_aggregate_pushdown should check/reject hasWindowFuncs too?
>
> We don't need to reject window functions, window functions are processed after
> grouping/aggregation. The problem I noticed in the regression tests was that a
> window function referenced a (non-window) aggregate. We just need to ensure
> that pull_var_clause() recurses into that window function in such cases:
>
> Besides the next version, v21-fixes.patch file is attached. It tries to
> summarize all the changes between v21 and v22. (I wonder if this attachment
> makes the cfbot fail.)
>
>
> diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
> index 8e913c92d8..8dc39765f2 100644
> --- a/src/backend/optimizer/plan/initsplan.c
> +++ b/src/backend/optimizer/plan/initsplan.c
> @@ -355,7 +355,8 @@ create_aggregate_grouped_var_infos(PlannerInfo *root)
> Assert(root->grouped_var_list == NIL);
>
> tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
> - PVC_INCLUDE_AGGREGATES);
> + PVC_INCLUDE_AGGREGATES |
> + PVC_RECURSE_WINDOWFUNCS);
>
> /*
> * Although GroupingFunc is related to root->parse->groupingSets, this
>
>
> > ---
> > src/backend/optimizer/util/relnode.c | 11 +++++++++++
> > src/include/nodes/pathnodes.h | 3 +++
> > 2 files changed, 14 insertions(+)
> >
> > diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
> > index 94720865f47..d4367ba14a5 100644
> > --- a/src/backend/optimizer/util/relnode.c
> > +++ b/src/backend/optimizer/util/relnode.c
> > @@ -382,6 +382,12 @@ find_base_rel(PlannerInfo *root, int relid)
> > /*
> > * build_rel_hash
> > * Construct the auxiliary hash table for relation specific data.
> > + *
> > + * XXX Why is this renamed, leaving out the "join" part? Are we going to use
> > + * it for other purposes?
>
> Yes, besides join relation, it's used to find the "grouped relation" by
> Relids. This change tries to follow the suggestion "Maybe an appropriate
> preliminary patch ..." in [1], but I haven't got any feedback whether my
> understanding was correct.
>
> > + * XXX Also, why change the API and not pass PlannerInfo? Seems pretty usual
> > + * for planner functions.
>
> I think that the reason was that, with the patch applied, PlannerInfo contains
> multiple fields of the RelInfoList type, so build_rel_hash() needs an
> information which one it should process. Passing the exact field is simpler
> than passing PlannerInfo plus some additional information.
>
> > */
> > static void
> > build_rel_hash(RelInfoList *list)
> > @@ -422,6 +428,11 @@ build_rel_hash(RelInfoList *list)
> > /*
> > * find_rel_info
> > * Find a base or join relation entry.
> > + *
> > + * XXX Why change the API and not pass PlannerInfo? Seems pretty usual
> > + * for planner functions.
>
> For the same reason that build_rel_hash() receives the list explicitly, see
> above.
>
> > + * XXX I don't understand why we need both this and find_join_rel.
>
> Perhaps I just wanted to keep the call sites of find_join_rel() untouched. I
> think that
>
> find_join_rel(root, relids);
>
> is a little bit easier to read than
>
> (RelOptInfo *) find_rel_info(root->join_rel_list, relids);
>
> > */
> > static void *
> > find_rel_info(RelInfoList *list, Relids relids)
> > diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
> > index 0ca7d5ab51e..018ce755720 100644
> > --- a/src/include/nodes/pathnodes.h
> > +++ b/src/include/nodes/pathnodes.h
> > @@ -88,6 +88,9 @@ typedef enum UpperRelationKind
> > * present and valid when rel_hash is not NULL. Note that we still maintain
> > * the list even when using the hash table for lookups; this simplifies life
> > * for GEQO.
> > + *
> > + * XXX I wonder why we actually need a separate node, merely wrapping fields
> > + * that already existed ...
>
> This is so that the existing fields can still be printed out
> (nodes/outfuncs.c).
>
> > diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
> > index 2fd1a962699..6f6b7d0b93b 100644
> > --- a/src/backend/optimizer/README
> > +++ b/src/backend/optimizer/README
> > @@ -1168,6 +1168,12 @@ input of Agg node. However, if the groups are large enough, it may be more
> > efficient to apply the partial aggregation to the output of base relation
> > scan, and finalize it when we have all relations of the query joined:
> >
> > +XXX review: Hmm, do we need to push it all the way down to base relations? Or
> > +would it make sense to do the agg on an intermediate level? Say, we're joining
> > +three tables A, B and C. Maybe the agg could/should be evaluated on top of join
> > +A+B, before joining with C? Say, maybe the aggregate references columns from
> > +both base relations?
> > +
> > EXPLAIN
> > SELECT a.i, avg(b.y)
> > FROM a JOIN b ON b.j = a.i
>
> Another example below does show the partial aggregates at join level.
>
> > +XXX Perhaps mention this may also mean the partial ggregate could be pushed
> > +to a remote server with FDW partitions?
>
> Even if it's not implemented in the current patch version?
>
> > +
> > Note that there's often no GROUP BY expression to be used for the partial
> > aggregation, so we use equivalence classes to derive grouping expression: in
> > the example above, the grouping key "b.j" was derived from "a.i".
> >
> > +XXX I think this is slightly confusing - there is a GROUP BY expression for the
> > +partial aggregate, but as stated in the query it may not reference the side of
> > +a join explicitly.
>
> ok, changed.
>
> > Also note that in this case the partial aggregate uses the "b.j" as grouping
> > column although the column does not appear in the query target list. The point
> > is that "b.j" is needed to evaluate the join condition, and there's no other
> > way for the partial aggregate to emit its values.
> >
> > +XXX Not sure I understand what this is trying to say. Firstly, maybe it'd be
> > +helpful to show targetlists in the EXPLAIN, i.e. do it as VERBOSE. But more
> > +importantly, isn't this a direct consequence of the equivalence classes stuff
> > +mentioned in the preceding paragraph?
>
> The equivalence class is just a mechanism to derive expressions which are not
> explicitly mentioned in the query, but there's always a question whether you
> need to derive any expression for particular table or not. Here I tried to
> explain that the choice of join columns is related to the choice of grouping
> keys for the partial aggregate.
>
> I've deleted this paragraph and added a note to the previous one.
>
> > Besides base relation, the aggregation can also be pushed down to join:
> >
> > EXPLAIN
> > @@ -1217,6 +1235,10 @@ Besides base relation, the aggregation can also be pushed down to join:
> > -> Hash
> > -> Seq Scan on a
> >
> > +XXX Aha, so this is pretty-much an answer to my earlier comment, and matches
> > +my example with three tables. Maybe this suggests the initial reference to
> > +base relations is a bit confusing.
>
> I tried to use the simplest example to demonstrate the concepts, then extended
> it to the partially-aggregated joins.
>
> > +XXX I think this is a good explanation of the motivation for this patch, but
> > +maybe it'd be good to go into more details about how we decide if it's correct
> > +to actually do the pushdown, data structures etc. Similar to earlier parts of
> > +this README.
>
> Added two paragraphs, see "Regarding correctness...".
>
> > diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
> > index f00f900ff41..6d2c2f4fc36 100644
> > --- a/src/backend/optimizer/path/allpaths.c
> > +++ b/src/backend/optimizer/path/allpaths.c
> > @@ -196,9 +196,10 @@ make_one_rel(PlannerInfo *root, List *joinlist)
> > /*
> > * Now that the sizes are known, we can estimate the sizes of the grouped
> > * relations.
> > + *
> > + * XXX Seems more consistent with code nearby.
> > */
> > - if (root->grouped_var_list)
> > - setup_base_grouped_rels(root);
> > + setup_base_grouped_rels(root);
>
> In general I prefer not calling a function if it's obvious that it's not
> needed, but on the other hand the test of the 'grouped_var_list' field may be
> considered disturbing from the caller's perspective. I've got no strong
> opinion on this, so I can accept this proposal.
>
> >
> > /*
> > - * setup_based_grouped_rels
> > + * setup_base_grouped_rels
> > * For each "plain" relation build a grouped relation if aggregate pushdown
> > * is possible and if this relation is suitable for partial aggregation.
> > */
>
> Fixed, thanks.
>
> > {
> > Index rti;
> >
> > + /* If there are no grouped relations, estimate their sizes. */
> > + if (!root->grouped_var_list)
> > + return;
> > +
>
> Accepted, but with different wording (s/relations/expressions/).
>
> > + /* XXX Shouldn't this check be earlier? Seems cheaper than the check
> > + * calling bms_nonempty_difference, for example. */
> > if (brel->reloptkind != RELOPT_BASEREL)
> > continue;
>
> Right, moved.
>
> > rel_grouped = build_simple_grouped_rel(root, brel->relid, &agg_info);
> > - if (rel_grouped)
> > - {
> > - /* Make the relation available for joining. */
> > - add_grouped_rel(root, rel_grouped, agg_info);
> > - }
> > +
> > + /* XXX When does this happen? */
> > + if (!rel_grouped)
> > + continue;
> > +
> > + /* Make the relation available for joining. */
> > + add_grouped_rel(root, rel_grouped, agg_info);
>
> I'd use the "continue" statement if there was a lot of code in the "if
> (rel_grouped) {...}" branch, but no strong preference in this case, so
> accepted.
>
> > }
> > }
> >
> > @@ -560,6 +569,8 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
> > /* Plain relation */
> > set_plain_rel_pathlist(root, rel, rte);
> >
> > + /* XXX Shouldn't this really be part of set_plain_rel_pathlist? */
> > +
> > /* Add paths to the grouped relation if one exists. */
> > rel_grouped = find_grouped_rel(root, rel->relids,
>
> Yes, it can. Moved.
>
> > @@ -3382,6 +3393,11 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
> >
> > /*
> > * Apply partial aggregation to a subpath and add the AggPath to the pathlist.
> > + *
> > + * XXX I think this is potentially quite confusing, because the existing "add"
> > + * functions add_path and add_partial_path only check if the proposed path is
> > + * dominated by an existing path, pathkeys, etc. But this does more than that,
> > + * perhaps even constructing new path etc.
> > */
> > static void
> > add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
>
> Maybe, but I don't have a good idea of an alternative name.
> create_group_path() already exists and the create_*_path() functions are
> rather low-level. Maybe generate_grouped_path(), and at the same time rename
> generate_grouping_paths() to generate_grouped_paths()? In general, the
> generate_*_path*() functions do non-trivial things and eventually call
> add_path().
>
> > @@ -3399,9 +3414,16 @@ add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
> > else
> > elog(ERROR, "unexpected strategy %d", aggstrategy);
> >
> > + /*
> > + * Bail out if we failed to create a suitable aggregated path. This can
> > + * happen e.g. then the path does not support hashing (for AGG_HASHED),
> > + * or when the input path is not sorted.
> > + */
> > + if (agg_path == NULL)
> > + return;
> > +
> > /* Add the grouped path to the list of grouped base paths. */
> > - if (agg_path != NULL)
> > - add_path(rel, (Path *) agg_path);
> > + add_path(rel, (Path *) agg_path);
>
> ok, changed.
>
> > }
> >
> > /*
> > @@ -3545,7 +3567,6 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
> >
> > for (lev = 2; lev <= levels_needed; lev++)
> > {
> > - RelOptInfo *rel_grouped;
> > ListCell *lc;
> >
> > /*
> > @@ -3567,6 +3588,8 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
> > */
> > foreach(lc, root->join_rel_level[lev])
> > {
> > + RelOptInfo *rel_grouped;
> > +
> > rel = (RelOptInfo *) lfirst(lc);
>
> Sure, fixed.
>
> > diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
> > index 8e913c92d8b..d7a9de9645e 100644
> > --- a/src/backend/optimizer/plan/initsplan.c
> > +++ b/src/backend/optimizer/plan/initsplan.c
> > @@ -278,6 +278,8 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
> > * each possible grouping expression.
> > *
> > * root->group_pathkeys must be setup before this function is called.
> > + *
> > + * XXX Perhaps this should check/reject hasWindowFuncs too?
>
> create_window_paths() is called after create_grouping_paths() (see
> grouping_planner()), so it should not care whether the input (possibly
> grouped) paths involve the aggregate push-down or not.
>
> > */
> > extern void
> > setup_aggregate_pushdown(PlannerInfo *root)
> > @@ -311,6 +313,12 @@ setup_aggregate_pushdown(PlannerInfo *root)
> > if (root->parse->hasTargetSRFs)
> > return;
> >
> > + /*
> > + * XXX Maybe it'd be better to move create_aggregate_grouped_var_infos and
> > + * create_grouping_expr_grouped_var_infos to a function returning bool, and
> > + * only check that here.
> > + */
> > +
>
> Hm, it looks to me like too much "indirection", and also a decriptive function
> name would be tricky to invent.
>
> > /* Create GroupedVarInfo per (distinct) aggregate. */
> > create_aggregate_grouped_var_infos(root);
> >
> > @@ -329,6 +337,8 @@ setup_aggregate_pushdown(PlannerInfo *root)
> > * Now that we know that grouping can be pushed down, search for the
> > * maximum sortgroupref. The base relations may need it if extra grouping
> > * expressions get added to them.
> > + *
> > + * XXX Shouldn't we do that only when adding extra grouping expressions?
> > */
> > Assert(root->max_sortgroupref == 0);
> > foreach(lc, root->processed_tlist)
>
> We don't know at this (early) stage whether those "extra grouping expression"
> will be needed for at least one relation. (max_sortgroupref is used by
> create_rel_agg_info())
>
> > diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
> > index 0ada3ba3ebe..2f4db69c1f9 100644
> > --- a/src/backend/optimizer/plan/planner.c
> > +++ b/src/backend/optimizer/plan/planner.c
> > @@ -3899,6 +3899,10 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
> > /*
> > * The non-partial paths can come either from the Gather above or from
> > * aggregate push-down.
> > + *
> > + * XXX I can't quite convince myself this is correct. How come it's fine
> > + * to check pathlist and then call set_cheapest() on partially_grouped_rel?
> > + * Maybe it's correct and the comment merely needs to explain this.
>
> It's not clear to me what makes you confused. Without my patch, the code looks
> like this:
>
> if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
> {
> gather_grouping_paths(root, partially_grouped_rel);
> set_cheapest(partially_grouped_rel);
> }
>
> Here gather_grouping_paths() adds paths to partially_grouped_rel->pathlist. My
> patch calls set_cheapest() independent from gather_grouping_paths() because
> the paths requiring the aggregate finalization can also be generated by the
> aggregate push-down feature.
>
> > */
> > if (partially_grouped_rel && partially_grouped_rel->pathlist)
> > set_cheapest(partially_grouped_rel);
> > @@ -6847,6 +6851,12 @@ create_partial_grouping_paths(PlannerInfo *root,
> > * push-down.
> > */
> > partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
> > +
> > + /*
> > + * If the relation already exists, it must have been created by aggregate
> > + * pushdown. We can't check how exactly it got created, but we can at least
> > + * check that aggregate pushdown is enabled.
> > + */
> > Assert(enable_agg_pushdown || partially_grouped_rel == NULL);
>
> ok, done.
>
> > @@ -6872,6 +6882,8 @@ create_partial_grouping_paths(PlannerInfo *root,
> > * If we can't partially aggregate partial paths, and we can't partially
> > * aggregate non-partial paths, then don't bother creating the new
> > * RelOptInfo at all, unless the caller specified force_rel_creation.
> > + *
> > + * XXX Not sure why we're checking the partially_grouped_rel here?
> > */
> > if (cheapest_total_path == NULL &&
> > cheapest_partial_path == NULL &&
>
> I think (but not verified yet) that without this test the function could
> return NULL for reasons unrelated to the aggregate push-down. Nevertheless, I
> realize now that there's no aggregate push-down specific processing in the
> function. I've adjusted it so that it does return, but the returned value is
> partially_grouped_rel rather than NULL.
>
> > @@ -6881,7 +6893,9 @@ create_partial_grouping_paths(PlannerInfo *root,
> >
> > /*
> > * Build a new upper relation to represent the result of partially
> > - * aggregating the rows from the input relation.
> > + * aggregating the rows from the input relation. The relation may
> > + * already exist due to aggregate pushdown, in which case we don't
> > + * need to create it.
> > */
> > if (partially_grouped_rel == NULL)
> > partially_grouped_rel = fetch_upper_rel(root,
>
> ok, done.
>
> > @@ -6903,6 +6917,8 @@ create_partial_grouping_paths(PlannerInfo *root,
> > *
> > * If the target was already created for the sake of aggregate push-down,
> > * it should be compatible with what we'd create here.
> > + *
> > + * XXX Why is this checking reltarget->exprs? What does that mean?
> > */
> > if (partially_grouped_rel->reltarget->exprs == NIL)
> > partially_grouped_rel->reltarget =
>
> I've added this comment:
>
> * XXX If fetch_upper_rel() had to create a new relation (i.e. aggregate
> * push-down generated no paths), it created an empty target. Should we
> * change the convention and have it assign NULL to reltarget instead? Or
> * should we introduce a function like is_pathtarget_empty()?
>
> > diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
> > index 7025ebf94be..395bd093d34 100644
> > --- a/src/backend/optimizer/util/pathnode.c
> > +++ b/src/backend/optimizer/util/pathnode.c
> > @@ -3163,9 +3163,21 @@ create_agg_path(PlannerInfo *root,
> > }
> >
> > /*
> > + * create_agg_sorted_path
> > + * Creates a pathnode performing sorted aggregation/grouping
> > + *
> > * Apply AGG_SORTED aggregation path to subpath if it's suitably sorted.
> > *
> > * NULL is returned if sorting of subpath output is not suitable.
> > + *
> > + * XXX I'm a bit confused why we need this? We now have create_agg_path and also
> > + * create_agg_sorted_path and create_agg_hashed_path.
>
> Do you mean that the function names are confusing? The functions
> create_agg_sorted_path() and create_agg_hashed_path() do some checks /
> preparation for the call of the existing function create_agg_path(), which is
> more low-level. Should the names be something like
> create_partial_agg_sorted_path() and create_partial_agg_hashed_path() ?
>
> > + *
> > + * XXX This assumes the input path to be sorted in a suitable way, but for
> > + * regular aggregation we check that separately and then perhaps add sort
> > + * if needed (possibly incremental one). That is, we don't do such checks
> > + * in create_agg_path. Shouldn't we do the same thing before calling this
> > + * new functions?
> > */
> > AggPath *
> > create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
> > @@ -3184,6 +3196,7 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
> > agg_exprs = agg_info->agg_exprs;
> > target = agg_info->target;
>
> Likewise, it seems that you'd like to see different function name and maybe
> different location of this function. Both create_agg_sorted_path() and
> create_agg_hashed_path() are rather wrappers for create_agg_path().
>
> >
> > + /* Bail out if the input path is not sorted at all. */
> > if (subpath->pathkeys == NIL)
> > return NULL;
>
> ok, done.
>
> > @@ -3192,6 +3205,18 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
> >
> > /*
> > * Find all query pathkeys that our relation does affect.
> > + *
> > + * XXX Not sure what "that our relation does affect" means? Also, we
> > + * are not looking at query_pathkeys but group_pathkeys, so that's a
> > + * bit confusing. Perhaps something like this would be better:
> > + *
>
> Indeed, the check of pathkeys was weird, I've reworked it.
>
> > @@ -3210,10 +3235,21 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
> > }
> > }
> >
> > + /* Bail out if the subquery has no pathkeys for the grouping. */
> > if (key_subset == NIL)
> > return NULL;
> >
> > - /* Check if AGG_SORTED is useful for the whole query. */
> > + /*
> > + * Check if AGG_SORTED is useful for the whole query.
> > + *
> > + * XXX So this means we require the group pathkeys matched to the
> > + * subpath have to be a prefix of subpath->pathkeys. Why is that
> > + * necessary? We'll reduce the cardinality, and in the worst case
> > + * we'll have to add a separate sort (full or incremental). Or we
> > + * could finalize using hashed aggregate.
>
> Although with different arguments, pathkeys_contained_in() is still used in
> the new version of the patch. I've added a TODO comment about the incremental
> sort (it did not exist when I was writing the patch), but what do you mean by
> "reducing the cardinality"? Eventually the partial aggregate should reduce the
> cardinality, but for the AGG_SORT strategy to work, the input sorting must be
> such that the executor can recognize the group boundaries.
>
> > + *
> > + * XXX Doesn't seem to change any regression tests when disabled.
> > + */
> > if (!pathkeys_contained_in(key_subset, subpath->pathkeys))
> > return NULL;
>
> "disabled" means removal of this part (including the return statement), or
> returning NULL unconditionally? Whatever you mean, please check with the new
> version.
>
> > @@ -3231,7 +3267,7 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
> > result = create_agg_path(root, rel, subpath, target,
> > AGG_SORTED, aggsplit,
> > agg_info->group_clauses,
> > - NIL,
> > + NIL, /* qual for HAVING clause */
> > &agg_costs,
> > dNumGroups);
>
> ok, done here as well as in create_agg_hashed_path().
>
> > @@ -3283,6 +3319,9 @@ create_agg_hashed_path(PlannerInfo *root, RelOptInfo *rel,
> > &agg_costs,
> > dNumGroups);
> >
> > + /*
> > + * XXX But we can spill to disk in hashagg now, no?
> > + */
> > if (hashaggtablesize < work_mem * 1024L)
> > {
>
> Yes, we can. It wasn't possible while I was writing the patch. Fixed.
>
> > diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
> > index 868d21c351e..6e87ada684b 100644
> > --- a/src/backend/utils/misc/postgresql.conf.sample
> > +++ b/src/backend/utils/misc/postgresql.conf.sample
> > @@ -388,6 +388,7 @@
> > #enable_seqscan = on
> > #enable_sort = on
> > #enable_tidscan = on
> > +#enable_agg_pushdown = on
>
> Done.
>
> > diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
> > index 1055ea70940..05192ca549a 100644
> > --- a/src/backend/optimizer/path/allpaths.c
> > +++ b/src/backend/optimizer/path/allpaths.c
> > @@ -3352,7 +3352,7 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
> > RelOptInfo *rel_plain, RelAggInfo *agg_info)
> > {
> > ListCell *lc;
> > - Path *path;
> > + Path *path; /* XXX why declare at this level, not in the loops */
> >
>
> I usually do it this way, not sure why. Perhaps because it's less typing :-) I
> changed that in the next version so that we don't waste time arguing about
> unimportant things.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
5212d447fa53518458cbe609092b347803a667c5 ===
=== applying patch ./v21-fixes.patch
patching file src/backend/optimizer/README
Hunk #1 FAILED at 1186.
1 out of 1 hunk FAILED -- saving rejects to file
src/backend/optimizer/README.rej
patching file src/backend/optimizer/path/allpaths.c
Hunk #1 FAILED at 197.
Hunk #2 FAILED at 341.
Hunk #3 succeeded at 339 with fuzz 1 (offset -11 lines).
Hunk #4 succeeded at 1014 with fuzz 2 (offset 647 lines).
Hunk #5 FAILED at 378.
Hunk #6 FAILED at 563.
Hunk #7 succeeded at 2793 with fuzz 1 (offset 1948 lines).
Hunk #8 FAILED at 867.
Hunk #9 FAILED at 3439.
Hunk #10 FAILED at 3590.
Hunk #11 succeeded at 3430 (offset -182 lines).
7 out of 11 hunks FAILED -- saving rejects to file
src/backend/optimizer/path/allpaths.c.rej
patching file src/backend/optimizer/path/costsize.c

[1] - http://cfbot.cputube.org/patch_41_3764.log

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2023-01-04 10:15:05 Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Previous Message vignesh C 2023-01-04 10:06:27 Re: Prefetch the next tuple's memory during seqscans