Re: Combining Aggregates

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>
Subject: Re: Combining Aggregates
Date: 2016-03-20 03:48:44
Message-ID: CAKJS1f8AtOiUaQb1ipz--K5G1aXH4+GUEsttAYS9GpKhk2_9tA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17 March 2016 at 14:25, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> On 03/16/2016 12:03 PM, David Rowley wrote:
>> Patch 2:
>> This adds the serial/deserial aggregate infrastructure, pg_dump
>> support, CREATE AGGREGATE changes, and nodeAgg.c changes to have it
>> serialise and deserialise aggregate states when instructed to do so.
>>
>> Patch 3:
>> This adds a boat load of serial/deserial functions, and combine
>> functions for most of the built-in numerical aggregate functions. It
>> also contains some regression tests which should really be in patch 2,
>> but I with patch 2 there's no suitable serialisation or
>> de-serialisation functions to test CREATE AGGREGATE with. I think
>> having them here is ok, as patch 2 is quite useless without patch 3
>> anyway.
>
> I don't see how you could move the tests into #2 when the functions are
> defined in #3? IMHO this is the right place for the regression tests.

Yeah, but the CREATE AGGREGATE changes which are being tested in 0003
were actually added in 0002. I think 0002 is the right place to test
the changes to CREATE AGGREGATE, but since there's a complete lack of
functions to use, then I've just delayed until 0003.

>> Another thing to note about this patch is that I've gone and created
>> serial/de-serial functions for when PolyNumAggState both require
>> sumX2, and don't require sumX2. I had thought about perhaps putting an
>> extra byte in the serial format to indicate if a sumX2 is included,
>> but I ended up not doing it this way. I don't really want these serial
>> formats getting too complex as we might like to do fun things like
>> pass them along to sharded servers one day, so it might be nice to
>> keep them simple.
>
>
> Hmmm, I've noticed that while eyeballing the diffs, and I'm not sure if it's
> worth the additional complexity at this point. I mean, one byte is hardly
> going to make a measurable difference - we're probably wasting more than
> that due to alignment, for example.

I don't think any alignment gets done here. Look at pq_sendint().
Did you mean the complexity of having extra functions, or having the
extra byte to check in the deserial func?

> As you've mentioned sharded servers - how stable should the serialized
> format be? I've assumed it to be transient, i.e. in the extreme case it
> might change after restarting a database - for the parallel aggregate that's
> clearly sufficient.
>
> But if we expect this to eventually work in a sharded environment, that's
> going to be way more complicated. I guess in these cases we could rely on
> implicit format versioning, via the major the version (doubt we'd tweak the
> format in a minor version anyway).
>
> I'm not sure the sharding is particularly useful argument at this point,
> considering we don't really know if the current format is actually a good
> match for that.

Perhaps you're right. At this stage I've no idea if we'd want to
support shards on varying major versions. I think probably not, since
the node write functions might not be compatible with the node read
functions on the other server.

I've attached another series of patches:

0001: This is the latest Parallel Aggregate Patch, not intended for
review here, but is required for the remaining patches. This patch has
changed quite a bit from the previous one that I posted here, and the
remaining patches needed re-based due to those changes.

0002: Adds serial/de-serial function support to CREATE AGGREGATE,
contains minor fix-ups from last version.

0003: Adds various combine/serial/de-serial functions for the standard
set of aggregates, apart from most float8 aggregates.

0004: Adds regression tests for 0003 pg_aggregate.h changes.

0005: Documents to mention which aggregate functions support partial mode.

0006: Re-based version of Haribabu's floating point aggregate support,
containing some fixes by me.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Allow-aggregation-to-happen-in-parallel_2016-03-20.patch application/octet-stream 50.9 KB
0002-Allow-INTERNAL-state-aggregates-to-participate-in-pa_2016-03-20.patch application/octet-stream 96.8 KB
0003-Add-various-aggregate-combine-and-serialize-de-seria_2016-03-20.patch application/octet-stream 60.9 KB
0004-Add-sanity-regression-tests-for-new-aggregate-serial_2016-03-20.patch application/octet-stream 4.7 KB
0005-Add-documents-to-explain-which-aggregates-support-pa_2016-03-20.patch application/octet-stream 16.7 KB
0006-Add-combine-functions-for-various-floating-point-agg_2016-03-20.patch application/octet-stream 27.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-03-20 04:02:23 Re: VS 2015 support in src/tools/msvc
Previous Message David Rowley 2016-03-20 03:23:03 Re: Combining Aggregates