Re: Push down Aggregates below joins

From: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Push down Aggregates below joins
Date: 2018-07-06 12:58:40
Message-ID: 19362.1530881920@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This is what I managed to hack so far. Now the patch can also handle the

Antonin Houska <ah(at)cybertec(dot)at> wrote:

> Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> > Ah, cool! I missed that thread earlier. Yes, seems like we've been hacking on
> > the same feature. Let's compare!
> >
> > I've been using this paper as a guide:
> >
> > "Including Group-By in Query Optimization", by Surajit Chaudhuri and Kyuseok
> > Shim:
> >
> > Using the terms from that paper, my patch does only "Invariant Grouping
> > transormation", while yours can do "Simple Coalescing Grouping", which is more
> > general. In layman terms, my patch can push the Aggregate below a join, while
> > your patch can also split an Aggregate so that you do a partial aggregate
> > below the join, and a final stage above it.
> Thanks for the link. I've just checked the two approaches briefly so far.
> > My thinking was to start with the simpler Invariant Grouping transformation
> > first, and do the more advanced splitting into partial aggregates later, as
> > a separate patch.
> I think for this you need to make sure that no join duplicates already
> processed groups. I tried to implement PoC of "unique keys" in v5 of my patch
> [1], see 09_avoid_agg_finalization.diff. The point is that the final
> aggregation can be replaced by calling pg_aggregate(aggfinalfn) function if
> the final join generates only unique values of the GROUP BY expression.
> Eventually I considered this an additional optimization of my approach and
> postponed work on this to later, but I think something like this would be
> necessary for your approach. As soon as you find out that a grouped relation
> is joined to another relation in a way that duplicates the grouping
> expression, you cannot proceed in creating grouped path.

The current patch version does not check the uniqueness of grouping keys, so
it actually doe not produce the plans you wanted to see. For expeimental
purposes, you can comment out the (agg_kind == REL_AGG_KIND_SIMPLE) branch
near the bottom of make_join_rel_common_grouped() and see that the
agg_pushdown regression test will generate plans with AGGSPLIT_SIMPLE pushed
down. For example

Finalize HashAggregate
Group Key: p.i
-> Hash Join
Hash Cond: (p.i = c1.parent)
-> Seq Scan on agg_pushdown_parent p
-> Hash
-> Partial HashAggregate
Group Key: c1.parent
-> Seq Scan on agg_pushdown_child1 c1

will become

Hash Join
Hash Cond: (p.i = c1.parent)
-> Seq Scan on agg_pushdown_parent p
-> Hash
-> HashAggregate
Group Key: c1.parent
-> Seq Scan on agg_pushdown_child1 c1

The plan will look correct but it can generate duplicate grouping keys.

> > There's some difference in the way we're dealing with the grouped
> > RelOptInfos. You're storing them completely separately, in PlannerInfo's new
> > 'simple_grouped_rel_array' array and 'join_grouped_rel_list/hash'. I'm
> > attaching each grouped RelOptInfo to its parent.
> In the first version of my patch I added several fields to RelOptInfo
> (reltarget for the grouped paths, separate pathlist, etc.) but some people
> didn't like it. In the later versions I introduced a separate RelOptInfo for
> grouped relations, but stored it in a separate list. Your approach might make
> the patch a bit less invasive, i.e. we don't have to add those RelOptInfo
> lists / arrays to standard_join_search() and subroutines.

Done. I think this concept might eventually lead to simplification of
create_grouping_paths(), especially the in the special cases related to
partitioning. I just think so because it's more generic, but haven't tried

> > You're introducing a new GroupedVar expression to represent Aggrefs between
> > the partial and final aggregates, while I'm just using Aggref directly, above
> > the aggregate node. I'm not thrilled about introducing an new Var-like
> > expression. We already have PlaceHolderVars, and I'm always confused on how
> > those work. But maybe that's necessary to supprot partial aggregation?
> The similarity of GroupedVar and PlaceHolderVar is that they are evaluated at
> some place in the join tree and the result is only passed to the joins above
> and eventually to the query target, w/o being evaluated again. In contrast,
> generic expressions are evaluated in the query target (only the input Vars get
> propagated from lower nodes), but that's not what we want for 2-stage
> aggregation.

> In my patch GroupedVar represents either the result of partial aggregation
> (the value of the transient state) or a grouping expression which is more
> complex than a plain column reference (Var expression).

I'm still not convinced that GroupedVar should be removed. First, RelOptInfo
can currently have either Var or PlaceHolderVar in its reltarget, so I prefer
to add no more than one kind of expression (GroupedVar can represent either
Aggref or a generic (non-Var) grouping expression). Second, GroupedVar
indicates during cost estimation that the value has been evaluated at lower
node of the join tree and thus the higher nodes should not account for the
evaluation again, see set_pathtarget_cost_width().

> > In the other thread, Robert Haas wrote:
> > > Concretely, in your test query "SELECT p.i, avg(c1.v) FROM
> > > agg_pushdown_parent AS p JOIN agg_pushdown_child1 AS c1 ON c1.parent =
> > > p.i GROUP BY p.i" you assume that it's OK to do a Partial
> > > HashAggregate over c1.parent rather than p.i. This will be false if,
> > > say, c1.parent is of type citext and p.i is of type text; this will
> > > get grouped together that shouldn't. It will also be false if the
> > > grouping expression is something like GROUP BY length(p.i::text),
> > > because one value could be '0'::numeric and the other '0.00'::numeric.
> > > I can't think of a reason why it would be false if the grouping
> > > expressions are both simple Vars of the same underlying data type, but
> > > I'm a little nervous that I might be wrong even about that case.
> > > Maybe you've handled all of this somehow, but it's not obvious to me
> > > that it has been considered.
> >
> > Ah, I made the same mistake. I did consider the "GROUP BY length(o.i::text)",
> > but not the cross-datatype case. I think we should punt on that for now, and
> > only do the substitution for simple Vars of the same datatype. That seems safe
> > to me.
> Yes, I reached the same conclusion. I'll add this restriction to the next
> version of the patch.


Rleated problem is "binary equality" of grouping keys. We need to avoid
aggregation push-down if it can discard information that JOIN/ON or WHERE
clause needs. The patch does not solve the problem yet. This is where the
discussion ended up: [1]

> > Overall, your patch is in a more polished state than my prototype.
> Probably I spent much more time on it.
> > For easier review, though, I think we should try to get something smaller
> > committed first, and build on that. Let's punt on the Var substitution.

> > And if we punt on the partial aggregation, and only allow pushing down the
> > whole Aggregate, how much smaller would your patch get?
> I can't tell now, need to spend some time looking at the code.

I didn't have enough time to separate "your functionality" and can do it when
I'm back from vacation. If you're courious and want to try yourself, this is
what you need to do to eliminate "my functionality":

* Remove the REL_AGG_KIND_PARTIAL constant (or the whole RelAggKind

* Remove the needs_final_agg field of the RelOptGrouped structure (or the
whole structure and make RelOptInfo grouped point directly to the RelOptInfo
that no_final_agg currently points to).

* Remove code paths "if (GroupedVar.agg_partial != NULL) ..."

* Revert my changes of create_ordinary_grouping_paths()

* Revert the related part of my changes of set_upper_references() (see

And then try to adjust the code so it can compile.

Of course the patch needs quite some polishing, but first we need to
achieve consensus on the concepts.

> > (I'll be on vacation for the next two weeks, but I'll be following this
> > thread. After that, I plan to focus on this feature, as time from reviewing
> > patches in the commitfest permits.)
> Likewise, I'll be off from July 5th to 22nd. I'll post what I have before I
> leave and will see what you could make out of it :-)
> It's cool that you intend to work on this feature too!
> [1]

A few more notes:

* I didn't have time to check if all the regression tests succeed. Besides the
tests that the patch adds I only ran the partition_join test with
enable_agg_pushdown enabled, to see that the patch does push aggregation down
to partitions.

* Older version of my patch contains the postgres_fdw part. I can adjust it
when I'm back.


Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt

Attachment Content-Type Size
agg_pushdown_v7.patch text/x-diff 275.1 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2018-07-06 13:01:08 Re: [HACKERS] PoC: full merge join on comparison clause
Previous Message Jerry Jelinek 2018-07-06 12:44:18 Re: patch to allow disable of WAL recycling