Re: Parallel Aggregates for string_agg and array_agg

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregates for string_agg and array_agg
Date: 2018-03-28 00:06:37
Message-ID: 452e8b7f-418b-4e61-c212-c0e4410b94ad@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/27/2018 04:51 AM, David Rowley wrote:
> On 27 March 2018 at 13:45, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>> On 27 March 2018 at 12:49, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Oh, I thought of another thing that would need to get done, if we decide
>>> to commit this. array_agg_serialize/deserialize only work if the array
>>> element type has send/receive functions. The planner's logic to decide
>>> whether partial aggregation is possible doesn't look any deeper than
>>> whether the aggregate has serialize/deserialize, but I think we have to
>>> teach it that if it's these particular serialize/deserialize functions,
>>> it needs to check the array element type. Otherwise we'll get runtime
>>> failures if partial aggregation is attempted on array_agg().
>>>
>>> This would not matter for any core datatypes, since they all have
>>> send/receive functions, but I imagine it's still pretty common for
>>> extension types not to.
>>>
>>> For my money it'd be sufficient to hard-code the test like
>>>
>>> if ((aggserialfn == F_ARRAY_AGG_SERIALIZE ||
>>> aggdeserialfn == F_ARRAY_AGG_DESERIALIZE) &&
>>> !array_element_type_has_send_and_recv(exprType(aggregate input)))
>>
>> I'd failed to consider this.
>>
>> Would it not be cleaner to just reject trans types without a
>> send/receive function? Or do you think that would exclude other valid
>> cases?
>
> Seems I didn't mean "trans types". I should have said aggregate
> function argument types.
>
> The attached I adds this check.
>

I'm not sure that's better than the check proposed by Tom. An argument
type without send/receive function does not necessarily mean we can't
serialize/deserialize the trans value. Because who says the argument
value will be embedded in the trans value?

For example, I might create an aggregate to build a bloom filter,
accepting anyelement. That will hash the argument value, and obviously
has no issues with serializing/deserializing the trans value.

This check would effectively disable parallelism for all aggregates with
anyarray, anyelement, anynonarray, anyenum (and a few more) arguments.
That for example includes jsonb_agg, which some people already mentioned
as another candidate for enabling parallelism. Not great, I guess.

Tom's solution is much more focused - it recognizes that array_agg is a
fairly rare case where the (actual) argument type gets stored in the
trans type, and detects that.

PS: The function is misnamed - you're talking about argument types, yet
the function name refers to transtypes.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-03-28 00:09:10 Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
Previous Message Haribabu Kommi 2018-03-28 00:06:23 Re: PQHost() undefined behavior if connecting string contains both host and hostaddr types