Re: Combining Aggregates

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>
Subject: Re: Combining Aggregates
Date: 2016-03-17 01:25:02
Message-ID: 00ab52f0-5cd9-a667-5c3d-b0917aef7677@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 03/16/2016 12:03 PM, David Rowley wrote:
> On 16 March 2016 at 10:34, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>> If Haribabu's patch does all that's required for the numerical
>> aggregates for floating point types then the status of covered
>> aggregates is (in order of pg_aggregate.h):
>>
>> * AVG() complete coverage
>> * SUM() complete coverage
>> * MAX() complete coverage
>> * MIN() complete coverage
>> * COUNT() complete coverage
>> * STDDEV + friends complete coverage
>> * regr_*,covar_pop,covar_samp,corr not touched these.
>> * bool*() complete coverage
>> * bitwise aggs. complete coverage
>> * Remaining are not touched. I see diminishing returns with making
>> these parallel for now. I think I might not be worth pushing myself
>> any harder to make these ones work.
>>
>> Does what I have done + floating point aggs from Haribabu seem
>> reasonable for 9.6?
>
> I've attached a series of patches.

Thanks. Haven't really done much testing of this, aside from reading
through the patches and "it compiles".
>
> Patch 1:
> This is the parallel aggregate patch, not intended for review here.
> However, all further patches are based on this, and this adds the
> required planner changes to make it possible to test patches 2 and 3.
>
> Patch 2:
> This adds the serial/deserial aggregate infrastructure, pg_dump
> support, CREATE AGGREGATE changes, and nodeAgg.c changes to have it
> serialise and deserialise aggregate states when instructed to do so.
>
> Patch 3:
> This adds a boat load of serial/deserial functions, and combine
> functions for most of the built-in numerical aggregate functions. It
> also contains some regression tests which should really be in patch 2,
> but I with patch 2 there's no suitable serialisation or
> de-serialisation functions to test CREATE AGGREGATE with. I think
> having them here is ok, as patch 2 is quite useless without patch 3
> anyway.

I don't see how you could move the tests into #2 when the functions are
defined in #3? IMHO this is the right place for the regression tests.

>
> Another thing to note about this patch is that I've gone and created
> serial/de-serial functions for when PolyNumAggState both require
> sumX2, and don't require sumX2. I had thought about perhaps putting an
> extra byte in the serial format to indicate if a sumX2 is included,
> but I ended up not doing it this way. I don't really want these serial
> formats getting too complex as we might like to do fun things like
> pass them along to sharded servers one day, so it might be nice to
> keep them simple.

Hmmm, I've noticed that while eyeballing the diffs, and I'm not sure if
it's worth the additional complexity at this point. I mean, one byte is
hardly going to make a measurable difference - we're probably wasting
more than that due to alignment, for example.

As you've mentioned sharded servers - how stable should the serialized
format be? I've assumed it to be transient, i.e. in the extreme case it
might change after restarting a database - for the parallel aggregate
that's clearly sufficient.

But if we expect this to eventually work in a sharded environment,
that's going to be way more complicated. I guess in these cases we could
rely on implicit format versioning, via the major the version (doubt
we'd tweak the format in a minor version anyway).

I'm not sure the sharding is particularly useful argument at this point,
considering we don't really know if the current format is actually a
good match for that.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2016-03-17 01:38:30 Re: pgbench -C -M prepared gives an error
Previous Message Haribabu Kommi 2016-03-17 01:12:51 Re: pg_hba_lookup function to get all matching pg_hba.conf entries