Re: Parallel Aggregates for string_agg and array_agg

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregates for string_agg and array_agg
Date: 2018-03-10 23:11:57
Message-ID: 24abe018-b3c6-2117-b5b9-39e660db00bd@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/05/2018 04:51 AM, David Rowley wrote:
> On 5 March 2018 at 04:54, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> 1) There seems to be forgotten declaration of initArrayResultInternal in
>> arrayfuncs.c. I suppose you've renamed it to initArrayResultWithSize and
>> moved it to a header file, and forgotten to remove this bit.
>
> Oops. Must be left over from trying things another way. Removed.
>
>> 2) bytea_string_agg_deserialize has this comment:
>>
>> /*
>> * data: technically we could reuse the buf.data buffer, but that is
>> * slightly larger than we need due to the extra 4 bytes for the cursor
>> */
>>
>> I find the argument "it has 4 extra bytes, so we'll allocate new chunk"
>> somewhat weak. We do allocate the memory in 2^n chunks anyway, so the
>> new chunk is likely to be much larger anyway. I'm not saying we need to
>> reuse the buffer, IMHO the savings will be non-measurable.
>
> Agreed. I've removed that part of the comment.
>
>> 3) string_agg_transfn and bytea_string_agg_transfn say this"
>>
>> /*
>> * We always append the delimiter text before the value text, even
>> * with the first aggregated item. The reason for this is that if
>> * this state needs to be combined with another state using the
>> * aggregate's combine function, then we need to have the delimiter
>> * for the first aggregated item. The first delimiter will be
>> * stripped off in the final function anyway. We use a little cheat
>> * here and store the position of the actual first value (after the
>> * delimiter) using the StringInfo's cursor variable. This relies on
>> * the cursor not being used for any other purpose.
>> */
>>
>> How does this make the code any simpler, compared to not adding the
>> delimiter (as before) and adding it when combining the values (at which
>> point you should know when it's actually needed)?
>
> The problem is that if we don't record the first delimiter then we
> won't know what it is when it comes to combining the states.
>
> Consider the following slightly backward-looking case;
>
> select string_agg(',', x::text) from generate_Series(1,10)x;
> string_agg
> ----------------------
> ,2,3,4,5,6,7,8,9,10,
>
> Here the delimiter is the number, not the ','. Of course, the
> delimiter for the first aggregated result is not shown. The previous
> implementation of the transfn for this aggregate just threw away the
> first delimiter, but that's no good for parallel aggregates as the
> transfn may be getting called in a parallel worker, in which case
> we'll need that delimiter in the combine function to properly join to
> the other partial aggregated results matching the same group.
>

Hmmm, you're right I haven't considered the delimiter might be variable.
But isn't just yet another hint that while StringInfo was suitable for
aggregate state of non-parallel string_agg, it may not be for the
parallel version?

What if the parallel string_agg instead used something like this:

struct StringAggState
{
char *delimiter; /* first delimiter */
StringInfoData str; /* partial aggregate state */
} StringAggState;

I think that would make the code cleaner, compared to using the cursor
field for that purpose. But maybe it'd make it not possible to share
code with the non-parallel aggregate, not sure.

I always considered the cursor field to be meant for scanning through
StringInfo, so using it for this purpose seems a bit like a misuse.

> Probably I could improve the comment a bit. I understand that having a
> variable delimiter is not commonplace in the normal world. I certainly
> had never considered it before working on this. I scratched my head a
> bit when doing this and thought about inventing a new trans type, but
> I decided that the most efficient design for an aggregate state was to
> store the delimiter and data all in one buffer and have a pointer to
> the start of the actual data... StringInfo has exactly what's required
> if you use the cursor as the pointer, so I didn't go and reinvent the
> wheel.
>

I wonder why the variable delimiter is not mentioned in the docs ... (at
least I haven't found anything mentioning the behavior).

> I've now changed the comment to read:
>
> /*
> * It's important that we store the delimiter text for all aggregated
> * items, even the first one, which at first thought you might think
> * could just be discarded. The reason for this is that if this
> * function is being called from a parallel worker, then we'll need
> * the first delimiter in order to properly combine the partially
> * aggregated state with the states coming from other workers. In the
> * final output, the first delimiter will be stripped off of the final
> * aggregate state, but in order to know where the actual first data
> * is we must store the position of the first data value somewhere.
> * Conveniently, StringInfo has a cursor property which can be used
> * to serve our needs here.
> */
>
> I've also updated an incorrect Assert in array_agg_array_serialize.
>
> Please find attached the updated patch.
>

Seems fine to me. I plan to do a bit more testing/review next week, but
I plan to move it to RFC after that (unless I run into something during
the review, but I don't expect that).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-03-10 23:19:31 Re: VACUUM FULL vs dropped columns
Previous Message Tom Lane 2018-03-10 22:48:28 Intermittent pg_ctl failures on Windows