Re: Combining Aggregates

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>
Subject: Re: Combining Aggregates
Date: 2015-12-24 01:28:59
Message-ID: CAKJS1f8WOG4UKztMjuK-JkaFDaWRWCde4L=QGM7rS-r19egthw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23 December 2015 at 21:50, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
wrote:

> Apart from the problems I listed above, I'm reasonably happy with the
> patch now, and I'm ready for someone else to take a serious look at it.
>

I forgot to mention another situation where this patch might need a bit
more thought. This does not affect any of the built in aggregate
functions, but if someone created a user defined aggregate such as:

CREATE AGGREGATE sumplusten (int)
(
stype = int,
sfunc = int4pl,
combinefunc = int4pl,
initcond = '10'
);

Note the initcond which initialises the state to 10. Then let's say we
later add the ability to push aggregates down below a Append.

create table t1 as select 10 as value from generate_series(1,10);
create table t2 as select 10 as value from generate_series(1,10);

With the following we get the correct result:

# select sumplusten(value) from (select * from t1 union all select * from
t2) t;
sumplusten
------------
210
(1 row)

But if we simulate how it would work in the aggregate push down situation:

# select sum(value) from (select sumplusten(value) as value from t1 union
all select sumplusten(value) as value from t2) t;
sum
-----
220
(1 row)

Here I'm using sum() as a mock up of the combine function, but you get the
idea... Since we initialize the state twice, we get the wrong result.

Now I'm not too sure if anyone would have an aggregate defined like this in
the real world, but I don't think that'll give us a good enough excuse to
go returning wrong results.

In the patch I could fix this by changing partial_aggregate_walker() to
disable partial aggregation if the aggregate has an initvalue, but that
would exclude a number of built-in aggregates unnecessarily. Alternatively
if there was some way to analyze the initvalue to see if it was "zero"...
I'm just not quite sure how we might do that at the moment.

Any thoughts on a good way to fix this that does not exclude built-in
aggregates with an initvalue are welcome.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2015-12-24 01:56:12 Re: Combining Aggregates
Previous Message Peter Geoghegan 2015-12-24 01:09:58 Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates