Re: Push down Aggregates below joins

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Antonin Houska <ah(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Push down Aggregates below joins
Date: 2018-06-21 08:00:34
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On 21/06/18 09:11, Antonin Houska wrote:
> Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> On 06/20/2018 10:12 PM, Heikki Linnakangas wrote:
>>> Currently, the planner always first decides the scan/join order, and
>>> adds Group/Agg nodes on top of the joins. Sometimes it would be legal,
>>> and beneficial, to perform the aggregation below a join. I've been
>>> hacking on a patch to allow that.
>> There was a patch [1] from Antonin Houska aiming to achieve something
>> similar. IIRC it aimed to push the aggregate down in more cases,
>> leveraging the partial aggregation stuff.
> Yes, I interrupted the work when it become clear that it doesn't find its way
> into v11. I'm about to get back to it next week, and at least rebase it so it
> can be applied to the current master branch.

Ah, cool! I missed that thread earlier. Yes, seems like we've been
hacking on the same feature. Let's compare!

I've been using this paper as a guide:

"Including Group-By in Query Optimization", by Surajit Chaudhuri and
Kyuseok Shim:

Using the terms from that paper, my patch does only "Invariant Grouping
transormation", while yours can do "Simple Coalescing Grouping", which
is more general. In layman terms, my patch can push the Aggregate below
a join, while your patch can also split an Aggregate so that you do a
partial aggregate below the join, and a final stage above it. My
thinking was to start with the simpler Invariant Grouping transformation
first, and do the more advanced splitting into partial aggregates later,
as a separate patch.

Doing partial aggregation actually made your patch simpler in some ways,
though. I had to make some changes to grouping_planner(), because in my
patch, it is no longer responsible for aggregation. In your patch, it's
still responsible for doing the final aggregation.

There's some difference in the way we're dealing with the grouped
RelOptInfos. You're storing them completely separately, in PlannerInfo's
new 'simple_grouped_rel_array' array and 'join_grouped_rel_list/hash'.
I'm attaching each grouped RelOptInfo to its parent.

I got away with much less code churn in allpaths.c, indxpath.c,
joinpath.c. You're adding new 'do_aggregate' flags to many functions.
I'm not sure if you needed that because you do partial aggregation and I
don't, but it would be nice to avoid it.

You're introducing a new GroupedVar expression to represent Aggrefs
between the partial and final aggregates, while I'm just using Aggref
directly, above the aggregate node. I'm not thrilled about introducing
an new Var-like expression. We already have PlaceHolderVars, and I'm
always confused on how those work. But maybe that's necessary to supprot
partial aggregation?

In the other thread, Robert Haas wrote:
> Concretely, in your test query "SELECT p.i, avg(c1.v) FROM
> agg_pushdown_parent AS p JOIN agg_pushdown_child1 AS c1 ON c1.parent =
> p.i GROUP BY p.i" you assume that it's OK to do a Partial
> HashAggregate over c1.parent rather than p.i. This will be false if,
> say, c1.parent is of type citext and p.i is of type text; this will
> get grouped together that shouldn't. It will also be false if the
> grouping expression is something like GROUP BY length(p.i::text),
> because one value could be '0'::numeric and the other '0.00'::numeric.
> I can't think of a reason why it would be false if the grouping
> expressions are both simple Vars of the same underlying data type, but
> I'm a little nervous that I might be wrong even about that case.
> Maybe you've handled all of this somehow, but it's not obvious to me
> that it has been considered.

Ah, I made the same mistake. I did consider the "GROUP BY
length(o.i::text)", but not the cross-datatype case. I think we should
punt on that for now, and only do the substitution for simple Vars of
the same datatype. That seems safe to me.

Overall, your patch is in a more polished state than my prototype. For
easier review, though, I think we should try to get something smaller
committed first, and build on that. Let's punt on the Var substitution.
And I'd suggest adopting my approach of attaching the grouped
RelOptInfos to the parent RelOptInfo, that seems simpler. And if we punt
on the partial aggregation, and only allow pushing down the whole
Aggregate, how much smaller would your patch get?

(I'll be on vacation for the next two weeks, but I'll be following this
thread. After that, I plan to focus on this feature, as time from
reviewing patches in the commitfest permits.)

- Heikki

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Arseny Sher 2018-06-21 08:31:17 Re: Fix slot's xmin advancement and subxact's lost snapshots in decoding.
Previous Message Masahiko Sawada 2018-06-21 07:49:34 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)