Re: Parallel Aggregates for string_agg and array_agg

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregates for string_agg and array_agg
Date: 2018-04-05 20:49:03
Message-ID: 8f76820a-5d3a-bade-59c3-a3852b730826@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/05/2018 05:41 AM, David Rowley wrote:
> Hi Tomas,
>
> Thanks for taking another look.
>
> On 5 April 2018 at 07:12, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> Seems fine to me, although we should handle the anyarray case too, I
>> guess. That is, get_agg_clause_costs_walker() should do this too:
>>
>> /* Same thing for array_agg_array_(de)serialize. */
>> if ((aggserialfn == F_ARRAY_AGG_ARRAY_SERIALIZE ||
>> aggdeserialfn == F_ARRAY_AGG_ARRAY_DESERIALIZE) &&
>> !agg_args_support_sendreceive(aggref))
>> costs->hasNonSerial = true;
>
> hmm, array_agg_array_serialize and array_agg_array_deserialize don't
> use the send/receive functions though, so not sure why that's
> required?
>
>> Other than that, the patch seems fine to me, and it's already marked as
>> RFC so I'll leave it at that.
>
> Thanks.
>
>> The last obstacle seems to be the argument about the risks of the patch
>> breaking queries of people relying on the ordering. Not sure what's are
>> the right next steps in this regard ...
>
> yeah, seems like a bit of a stalemate.
>
> Personally, think if the group of people Tom mentions do exist, then
> they've already been through some troubled times since Parallel Query
> was born. I'd hope they've already put up their defenses due to the
> advent of that feature. I stand by my thoughts that it's pretty
> bizarre to draw the line here when we've probably been causing these
> people issues for many years already. I said my piece on this already
> so likely not much point in going on about it. These people are also
> perfectly capable of sidestepping the whole issue with setting
> max_parallel_workers_per_gather TO 0.
>
> Perhaps one solution is to drop the string_agg and keep the array_agg.
> Unsure if that would be good enough for Tom? More people seem to have
> voiced that array_agg ordering is generally less of a concern, which I
> imagine is probably true, but my opinion might not matter here as I'm
> the sort of person who if I needed a specific ordering I'd have
> written an ORDER BY clause.
>

I don't really see how dropping string_agg would solve anything. Either
the risks apply to both these functions, or neither of them I think. If
anything, doing this only for one of them would be inconsistent and
surprising, considering how similar they are.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2018-04-05 20:51:41 Re: Online enabling of checksums
Previous Message Tom Lane 2018-04-05 20:47:28 Re: WIP: a way forward on bootstrap data