Re: Combining Aggregates

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>
Subject: Re: Combining Aggregates
Date: 2016-01-19 02:48:59
Message-ID: CAKJS1f_fpgAE_VSF1gH7obq00wt_Mg3Cr=C2iG9erzaBAvMJjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 19 January 2016 at 02:44, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
wrote:

> On Mon, Jan 18, 2016 at 10:32 PM, David Rowley
> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>
> I just thought like direct mapping of the structure with text pointer.
> something like
> the below.
>
> result = PG_ARGISNULL(0) ? NULL : (text *) PG_GETARG_POINTER(0);
> state = (PolyNumAggState *)VARDATA(result);
>
> To handle the big-endian or little-endian, we may need some extra changes.
>
> Instead of adding 3 new columns to the pg_aggregate catalog table to handle
> the internal types, either something like the above to handle the internal
> types
> or some other way is better IMO.

The problem with that is that most of these internal structs for the
aggregate states have pointers to other memory, so even if we laid those
bytes down into a bytea or something, then doing so is not going to
dereference the pointers to the other memory, and when we dereference those
pointers in the other process, we'll have problems as these addresses
belong to the other process.

For example PolyNumAggState is defined as:

typedef NumericAggState PolyNumAggState;

and NumericAggState has:

NumericVar sumX; /* sum of processed numbers */
NumericVar sumX2; /* sum of squares of processed numbers */

And NumericVar has:

NumericDigit *buf; /* start of palloc'd space for digits[] */
NumericDigit *digits; /* base-NBASE digits */

Both of these point to other memory which won't be in the varlena type.

Serialization is the process of collecting all of these pointers up in to
some consecutive bytes.

Of course, that's not to say that there's never Aggregate State structs
which don't have any pointers, I've not checked, but in these cases we
could (perhaps) just make the serialize and deserialize function a simple
memcpy() into a bytea array, although in reality, as you mentioned, we'd
likely want to agree on some format that's cross platform for different
byte orders, as we'll probably, one day, want to forward these values over
to some other server to finish off the aggregation.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2016-01-19 02:55:07 Re: pgindent-polluted commits
Previous Message Joe Conway 2016-01-19 02:08:06 Re: exposing pg_controldata and pg_config as functions