Re: Combining Aggregates

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>
Subject: Re: Combining Aggregates
Date: 2014-12-17 10:20:29
Message-ID: CAApHDvpFQ-TYta2sTzcGCpfZ9B5b9h75cc99usVekQo=qN_LaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17 December 2014 at 22:53, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> KaiGai, David Rowley and myself have all made mention of various ways
> we could optimize aggregates.
>
> Following WIP patch adds an extra function called a "combining
> function", that is intended to allow the user to specify a
> semantically correct way of breaking down an aggregate into multiple
> steps.
>
> Gents, is this what you were thinking? If not...
>
>
Very much so! You must have missed my patch.

http://www.postgresql.org/message-id/CAApHDvrZG5Q9rNxU4WOga8AgvAwQ83bF83CFvMbOQcCg8vk=Zw@mail.gmail.com

The cases I think that may benefit from such infrastructure would be:

1. Parallel queries, where each worker could work on a portion of the
tuples being aggregated and then the combine/merge function is called in
the end in order to produce the final aggregated result. We currently don't
yet have parallel query, so we can't really commit anything yet, for that
reason.

2. Queries such as:

SELECT p.name, SUM(s.qty) FROM sales s INNER JOIN product p ON s.product_id
= s.product_id GROUP BY p.name;

Such a query could be transformed into:

SELECT p.name,SUM(qty) FROM (SELECT product_id,SUM(qty) AS qty FROM sales
GROUP BY product_id) s
INNER JOIN product p ON p.product_id = s.product_id GROUP BY p_name;

Of course the outer query's SUM and GROUP BY would not be required if there
happened to be a UNIQUE index on product(name), but assuming there's not
then the above should produce the results faster. This of course works ok
for SUM(), but for something like AVG() or STDDEV() the combine/merge
aggregate functions would be required to process those intermediate
aggregate results that were produced by the sub-query.

Regards

David Rowley

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2014-12-17 11:00:41 Re: analyze_new_cluster.bat and delete_old_cluster.bat not ignored with vcregress upgradecheck
Previous Message Simon Riggs 2014-12-17 10:18:51 Re: Combining Aggregates