Re: Combining Aggregates

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>
Subject: Re: Combining Aggregates
Date: 2014-12-17 13:48:39
Message-ID: CA+U5nMKL3_K_-g6TTVYAgkqBigtAAmAmPe1rmOKhh9h-zB_qFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17 December 2014 at 12:35, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:

> Its concept is good to me. I think, the new combined function should be
> responsible to take a state data type as argument and update state object
> of the aggregate function. In other words, combined function performs like
> transition function but can update state object according to the summary
> of multiple rows. Right?

That wasn't how I defined it, but I think your definition is better.

> It also needs some enhancement around Aggref/AggrefExprState structure to
> inform which function shall be called on execution time.
> Combined functions are usually no-thank-you. AggrefExprState updates its
> internal state using transition function row-by-row. However, once someone
> push-down aggregate function across table joins, combined functions have
> to be called instead of transition functions.
> I'd like to suggest Aggref has a new flag to introduce this aggregate expects
> state object instead of scalar value.
>
> Also, I'd like to suggest one other flag in Aggref not to generate final
> result, and returns state object instead.
>
> Let me use David's example but little bit adjusted.
>
> original)
> SELECT p.name, AVG(s.qty)
> FROM sales s INNER JOIN product p ON s.product_id = s.product_id
> GROUP BY p.name;
>
> modified)
> SELECT p.name, AVG(qty)
> FROM (SELECT product_id, AVG(qty) AS qty FROM sales GROUP BY product_id) s
> INNER JOIN product p
> ON p.product_id = s.product_id GROUP BY p_name;
>
> Let's assume the modifier set a flag of use_combine_func on the AVG(qty) of
> the main query, and also set a flag of not_generate_final on the AVG(qty) of
> the sub-query.
> It shall work as we expected.

That matches my thinking exactly.

David, if you can update your patch with some docs to explain the
behaviour, it looks complete enough to think about committing it in
early January, to allow other patches that depend upon it to stand a
chance of getting into 9.5. (It is not yet ready, but I see it could
be).

The above example is probably the best description of the need, since
user defined aggregates must also understand this.

Could we please call these "combine functions" or other? MERGE is an
SQL Standard statement type that we will add later, so it will be
confusing if we use the "merge" word in this context.

David, your patch avoids creating any mergefuncs for existing
aggregates. We would need to supply working examples for at least a
few of the builtin aggregates, so we can test the infrastructure. We
can add examples for all cases later.

Is there a way of testing this in existing code? Or do we need to add
something to test it?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-12-17 14:07:34 Re: GiST kNN search queue (Re: KNN-GiST with recheck)
Previous Message Heikki Linnakangas 2014-12-17 13:40:32 exitArchiveRecovery woes