Re: Parallel Aggregates for string_agg and array_agg

From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregates for string_agg and array_agg
Date: 2018-03-27 07:22:39
Message-ID: CAA8=A7-OrZDOcRwZjqNv8nNEQzE8JSbjTsAjv2CK2zyq4jPe+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 27, 2018 at 5:36 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Tue, Mar 27, 2018 at 12:28 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>
>> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
>> > On 27 March 2018 at 09:27, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> >> I do not think it is accidental that these aggregates are exactly the
>> >> ones
>> >> that do not have parallelism support today. Rather, that's because you
>> >> just about always have an interest in the order in which the inputs get
>> >> aggregated, which is something that parallel aggregation cannot
>> >> support.
>>
>> > This very much reminds me of something that exists in the 8.4 release
>> > notes:
>> >> SELECT DISTINCT and UNION/INTERSECT/EXCEPT no longer always produce
>> >> sorted output (Tom)
>>
>> That's a completely false analogy: we got a significant performance
>> benefit for a significant fraction of users by supporting hashed
>> aggregation. My argument here is that only a very tiny fraction of
>> string_agg/array_agg users will not care about aggregation order, and thus
>> I don't believe that this patch can help very many people. Against that,
>> it's likely to hurt other people, by breaking their queries and forcing
>> them to insert expensive explicit sorts to fix it. Even discounting the
>> backwards-compatibility question, we don't normally adopt performance
>> features for which it's unclear that the net gain over all users is
>> positive.
>
>
> I think you are quite wrong in claiming that only a tiny fraction of the
> users are going to care.
>
> This may, and quite probably does, hold true for string_agg(), but not for
> array_agg(). I see a lot of cases where people use that to load it into an
> unordered array/hashmap/set/whatever on the client side, which looses
> ordering *anyway*,and they would definitely benefit from it.

Agreed, I have seen lots of uses of array_agg where the order didn't matter.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2018-03-27 07:24:07 Re: PQHost() undefined behavior if connecting string contains both host and hostaddr types
Previous Message Magnus Hagander 2018-03-27 07:06:59 Re: Parallel Aggregates for string_agg and array_agg