Re: Combining Aggregates

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>
Subject: Re: Combining Aggregates
Date: 2016-01-18 03:44:42
Message-ID: CA+TgmobRsLKbVu7Mh6VrPLYO+mV5jfsizBrp_eHqNNb6110_fg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 17, 2016 at 9:26 PM, David Rowley
<david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> hmm, so wouldn't that mean that the transition function would need to (for
> each input tuple):
>
> 1. Parse that StringInfo into tokens.
> 2. Create a new aggregate state object.
> 3. Populate the new aggregate state based on the tokenised StringInfo, this
> would perhaps require that various *_in() functions are called on each
> token.
> 4. Add the new tuple to the aggregate state.
> 5. Build a new StringInfo based on the aggregate state modified in 4.
>
> ?

I don't really know what you mean by parse the StringInfo into tokens.
The whole point of the expanded-object interface is to be able to keep
things in an expanded internal form so that you *don't* have to
repeatedly construct and deconstruct internal data structures. I
worked up an example of this approach using string_agg(), which I
attach here. This changes the transition type of string_agg() from
internal to text. The same code would work for bytea_string_agg(),
which would allow removal of some other code, but this patch doesn't
do that, because the point of this is to elucidate the approach.

In my tests, this seems to be slightly slower than what we're doing
today; worst of all, it adds a handful of cycles to
advance_transition_function() even when the aggregate is not an
expanded object or, indeed, not even pass-by-reference. Some of this
might be able to be fixed by a little massaging - in particular,
DatumIsReadWriteExpandedObject() seems like it could be partly or
entirely inlined, and maybe there's some other way to improve the
coding here.

Generally, I think finding a way to pass expanded objects through
nodeAgg.c would be a good thing to pursue, if we can make it work.
The immediate impetus for changing things this way would be that we
wouldn't need to add a mechanism for serializing and deserializing
internal functions just to pass around partial aggregates. But
there's another advantage, too: right now,
advance_transition_function() does a lot of data copying to move data
from per-call context to the per-aggregate context. When an expanded
object is in use, this can be skipped.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
expandedstring-v1.patch text/x-diff 11.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2016-01-18 03:47:47 Compiler warning in pg_am changes
Previous Message Bruce Momjian 2016-01-18 02:47:24 Re: Optimizer questions