Re: Combining Aggregates

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>
Subject: Re: Combining Aggregates
Date: 2015-12-30 00:39:55
Message-ID: CAKJS1f-jc4tBC4VfXNaVv5FhVKCz0HFFSoFGf9-_tH=HTztawA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25 December 2015 at 14:10, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Mon, Dec 21, 2015 at 4:53 PM, David Rowley
> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> > On 22 December 2015 at 01:30, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >> Can we use Tom's expanded-object stuff instead of introducing
> >> aggserialfn and aggdeserialfn? In other words, if you have a
> >> aggtranstype = INTERNAL, then what we do is:
> >>
> >> 1. Create a new data type that represents the transition state.
> >> 2. Use expanded-object notation for that data type when we're just
> >> within a single process, and flatten it when we need to send it
> >> between processes.
> >>
> >
> > I'd not seen this before, but on looking at it I'm not sure if using it
> will
> > be practical to use for this. I may have missed something, but it seems
> that
> > after each call of the transition function, I'd need to ensure that the
> > INTERNAL state was in the varlana format.
>
> No, the idea I had in mind was to allow it to continue to exist in the
> expanded format until you really need it in the varlena format, and
> then serialize it at that point. You'd actually need to do the
> opposite: if you get an input that is not in expanded format, expand
> it.

Admittedly I'm struggling to see how this can be done. I've spent a good
bit of time analysing how the expanded object stuff works.

Hypothetically let's say we can make it work like:

1. During partial aggregation (finalizeAggs = false), in
finalize_aggregates(), where we'd normally call the final function, instead
flatten INTERNAL states and store the flattened Datum instead of the
pointer to the INTERNAL state.
2. During combining aggregation (combineStates = true) have all the combine
functions written in such a ways that the INTERNAL states expand the
flattened states before combining the aggregate states.

Does that sound like what you had in mind?

If so I can't quite seem to wrap my head around 1. As I'm really not quite
sure how, from finalize_aggregates() we'd flatten the INTERNAL pointer. I
mean, how do we know which flatten function to call here? From reading the
expanded-object code I see that its used in expand_array(), In this case we
know we're working with arrays, so it just always uses the EA_methods
globally scoped struct to get the function pointers it requires for
flattening the array. For the case of finalize_aggregates(), the best I can
think of here is to have a bunch of global structs and then have a giant
case statement to select the correct one. That's clearly horrid, and not
commit worthy, and it does nothing to help user defined aggregates which
use INTERNAL types. Am I missing something here?

As of the most recent patch I posted, having the serial and deserial
functions in the catalogs allows user defined aggregates with INTERNAL
states to work just fine. Admittedly I'm not all that happy that I've had
to add 4 new columns to pg_aggregate to support this, but if I could think
of how to make it work without doing that, then I'd likely go and do that
instead.

If your problem with the serialize and deserialize stuff is around the
serialized format, then can see no reason why we couldn't just invent some
composite types for the current INTERNAL aggregate states, and have the
serialfn convert the INTERNAL state into one of those, then have the
deserialfn perform the opposite. Likely this would be neater than what I
have at the moment with just converting the INTERNAL state into text.

Please let me know what I'm missing with the expanded-object code.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2015-12-30 02:07:19 Re: On columnar storage (2)
Previous Message Haribabu Kommi 2015-12-30 00:28:24 Re: Multi-tenancy with RLS