Re: Combining Aggregates

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>
Subject: Re: Combining Aggregates
Date: 2014-12-17 19:06:47
Message-ID: CAApHDvrSYKBUkBZMG6e8SfaPFsO7bXK+tSWkbo_t_uO26MRtkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18 December 2014 at 02:48, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> On 17 December 2014 at 12:35, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
>
> > Its concept is good to me. I think, the new combined function should be
> > responsible to take a state data type as argument and update state object
> > of the aggregate function. In other words, combined function performs
> like
> > transition function but can update state object according to the summary
> > of multiple rows. Right?
>
> That wasn't how I defined it, but I think your definition is better.
>
> > It also needs some enhancement around Aggref/AggrefExprState structure to
> > inform which function shall be called on execution time.
> > Combined functions are usually no-thank-you. AggrefExprState updates its
> > internal state using transition function row-by-row. However, once
> someone
> > push-down aggregate function across table joins, combined functions have
> > to be called instead of transition functions.
> > I'd like to suggest Aggref has a new flag to introduce this aggregate
> expects
> > state object instead of scalar value.
> >
> > Also, I'd like to suggest one other flag in Aggref not to generate final
> > result, and returns state object instead.
> >
> > Let me use David's example but little bit adjusted.
> >
> > original)
> > SELECT p.name, AVG(s.qty)
> > FROM sales s INNER JOIN product p ON s.product_id = s.product_id
> > GROUP BY p.name;
> >
> > modified)
> > SELECT p.name, AVG(qty)
> > FROM (SELECT product_id, AVG(qty) AS qty FROM sales GROUP BY
> product_id) s
> > INNER JOIN product p
> > ON p.product_id = s.product_id GROUP BY p_name;
> >
> > Let's assume the modifier set a flag of use_combine_func on the AVG(qty)
> of
> > the main query, and also set a flag of not_generate_final on the
> AVG(qty) of
> > the sub-query.
> > It shall work as we expected.
>
> That matches my thinking exactly.
>
> David, if you can update your patch with some docs to explain the
> behaviour, it looks complete enough to think about committing it in
> early January, to allow other patches that depend upon it to stand a
> chance of getting into 9.5. (It is not yet ready, but I see it could
> be).
>
>
Yes I'll add something to it and post here.

> The above example is probably the best description of the need, since
> user defined aggregates must also understand this.
>
> Could we please call these "combine functions" or other? MERGE is an
> SQL Standard statement type that we will add later, so it will be
> confusing if we use the "merge" word in this context.
>
>
Yeah I think you're right, combine may help remove some confusion when we
get MERGE.

> David, your patch avoids creating any mergefuncs for existing
> aggregates. We would need to supply working examples for at least a
> few of the builtin aggregates, so we can test the infrastructure. We
> can add examples for all cases later.
>
>
I added merge/combine functions for all the aggregates I could do by making
use of existing functions. I did all the MAX() and MIN() ones and
bit_and(), bit_or(), and a few sum() ones that didn't have a final function.

It felt a bit premature to add new functions to support avg and stddev,
since that's probably the bulk of the work.

> Is there a way of testing this in existing code? Or do we need to add
> something to test it?
>
>
I can't think of anyway to test it. Apart from the create aggregate syntax
test I also added.

Standalone calls to the combine/merge functions I don't think would be
testing anything new.

That's the reason I thought it wasn't really acceptable until we have a use
for this. That's why I posted on the thread about parallel seqscan. I hoped
that Amit could add something which needed it and I could get it committed
that way.

Regards

David Rowley

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-12-17 19:13:40 Re: btree_gin and ranges
Previous Message Alvaro Herrera 2014-12-17 19:02:20 Re: moving from contrib to bin