Re: PoC: Grouped base relation

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PoC: Grouped base relation
Date: 2017-01-16 04:42:41
Message-ID: CAFjFpRejPP4K=g+0aaq_US0YrMaZzyM+NUCi=JgwaxhMUj2Zcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 13, 2017 at 10:12 PM, Antonin Houska <ah(at)cybertec(dot)at> wrote:
> Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>> On Mon, Jan 9, 2017 at 11:26 PM, Antonin Houska <ah(at)cybertec(dot)at> wrote:
>> > Attached is a draft patch that lets partial aggregation happen at base
>> > relation level. If the relations contain relatively small number of groups,
>> > the number of input rows of the aggregation at the query level can be reduced
>> > this way. Also, if append relation and postgres_fdw planning is enhanced
>> > accordingly, patch like this can let us aggregate individual tables on remote
>> > servers (e.g. shard nodes) and thus reduce the amount of rows subject to the
>> > final aggregation.
>
>> For an appendrel probably you need an ability to switch group->append into
>> append->group
>
> Yes, like the new patch version (see attachment) does:
>
> postgres=# EXPLAIN (COSTS false) SELECT b.j, sum(a.x) FROM a JOIN b ON a.i = b.j GROUP BY b.j;
> QUERY PLAN
> ------------------------------------------------------
> Finalize HashAggregate
> Group Key: b.j
> -> Hash Join
> Hash Cond: (a.i = b.j)
> -> Append
> -> Partial HashAggregate
> Group Key: a.i
> -> Seq Scan on a
> -> Partial HashAggregate
> Group Key: a_1.i
> -> Seq Scan on a_1
> -> Partial HashAggregate
> Group Key: a_2.i
> -> Seq Scan on a_2
> -> Hash
> -> Gather
> Workers Planned: 1
> -> Partial HashAggregate
> Group Key: b.j
> -> Parallel Seq Scan on b
>
>> For postgres_fdw, we already support aggregate pushdown.
>
> My understanding is that currently it only works if the whole query can be
> evaluated by the FDW. What I try to do is to push down aggregation of
> individual table, and join the partially-aggregated set with other tables,
> which are not necessarily remote or reside on different remote server.

You will need to invoke FDW's hook for aggregate pushdown for the base
relations. It would work as long as we don't ask it transient results.
But I guess, that can come later.

>> But we don't support fetching partial aggregates from foreign server. What
>> other enhancements do you need?
>
> Here I try to introduce infrastructure for aggregation pushdown and
> propagation of the transient aggregate state values from base relations to the
> final join. postgres_fdw can benefit from it but it's not the only use case,
> so I'd prefer adjusting it in a separate patch.
>
> Yes, an outstanding problem is that the remote nodes need to return transient
> state values - probably using bytea type. I think this functionality should
> even be separate from postgres_fdw (e.g. a new contrib module?), because the
> remote nodes do not need postgres_fdw.

Hmm, that's a missing piece. We need to work on it separately.

>
>> > A few "footnotes":
>> >
>> >
>> > As for the example, the processing continues by joining the partially grouped
>> > sets:
>> >
>> > i | sum(x)| count(i.*) | j | count(j.*)
>> > ----------------------------------------
>> > 1 | 7 | 2 | 1 | 2
>
> [ Sorry, count(j.*) should be 2, not 3 as I wrote in the initial email. ]
>
>> >
>> >
>> > Before performing the final aggregation, we need to multiply sum(a.x) by
>> > count(j.*) because w/o the aggregation at base relation level the input
>> > of the query-level aggregation would look like
>> >
>> > a.i | a.x | b.j
>> > ----------------
>> > 1 | 3 | 1
>> > 1 | 4 | 1
>> > 1 | 3 | 1
>> > 1 | 4 | 1
>> >
>> > In other words, grouping of the base relation "b" below the join prevents the
>> > join from bringing per-group input set to the aggregate input multiple
>> > times. To compensate for this effect, I've added a new field "aggtransmultifn"
>> > to pg_aggregate catalog. It "multiplies" the aggregate state w/o repeated
>> > processing of the same input set many times. sum() is an example of an
>> > aggregate that needs such processing, avg() is one that does not.
>>
>> For something like product aggregation, where the product (or higher order
>> operations) across rows is accumulated instead of sum, mere multiplication
>> wouldn't help. We will need some higher order operation to "extrapolate" the
>> result based on count(j.*). In fact, the multiplication factor will depend
>> upon the number of relations being joined E.g. select b.j, sum(a.x) where
>> a, b, c where a.i = b.j and a.i = c.k group by b.j
>
> Maybe you're saying what I already try to do. Let me modify the example
> accordingly (unlike the initial example, each table produces a group of
> different size so the the count() values are harder to confuse):
>
[... snip ]]

This all works well, as long as the aggregate is "summing" something
across rows. The method doesn't work when aggregation is say
"multiplying" across the rows or "concatenating" across the rows like
array_agg() or string_agg(). They need a different strategy to combine
aggregates across relations.

>
> I get exactly these values when I run your query on master branch w/o my
> patch, so my theory could be correct :-)
>
>> May be we want to implement this technique without partial aggregation first
>> i.e. push down aggregation and grouping down the join tree and then add
>> partial aggregation steps. That might make it easy to review. Looking at
>> the changes in create_plain_partial_paths(), it looks like we use this
>> technique only in case of parallel query. I think the technique is useful
>> otherwise as well.
>

IIUC, we are trying to solve multiple problems here:
1. Pushing down aggregates/groups down join tree, so that the number
of rows to be joined decreases.
This might be a good optimization to have. However there are problems
in the current patch. Every path built for a relation (join or base)
returns the same result expressed by the relation or its subset
restricted by parameterization or unification. But this patch changes
that. It creates paths which represent grouping in the base relation.
I think, we need a separate relation to represent that result and hold
paths which produce that result. That itself would be a sizable patch.

2. Try to push down aggregates based on the equivalence classes, where
grouping properties can be transferred from one relation to the other
using EC mechanism. This seems to require solving the problem of
combining aggregates across the relations. But there might be some
usecases which could benefit without solving this problem.

3. If the relation to which we push the aggregate is an append
relation, push (partial) aggregation/grouping down into the child
relations. - We don't do that right now even for grouping aggregation
on a single append table. Parallel partial aggregation does that, but
not exactly per relation. That may be a sizable project in itself.
Even without this piece the rest of the optimizations proposed by this
patch are important.

4. Additional goal: push down the aggregation to any relation
(join/base) where it can be computed.

If we break the problem down into smaller problems as above, 1. the
resulting patches will be easier to review 2. Since those problems
themselves produce some usable feature, there is a chance that more
people will be interested in reviewing/testing/coding on those.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Khandekar 2017-01-16 04:50:01 Re: Too many autovacuum workers spawned during forced auto-vacuum
Previous Message Fujii Masao 2017-01-16 04:40:34 Re: Re: Clarifying "server starting" messaging in pg_ctl start without --wait