Re: Parallel Aggregates for string_agg and array_agg

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregates for string_agg and array_agg
Date: 2018-03-27 07:06:59
Message-ID: CABUevEzy6=AT3meQf7q4jfP65nWjEGjPT_8D9n+BxanEgVozWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 27, 2018 at 12:28 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> > On 27 March 2018 at 09:27, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> I do not think it is accidental that these aggregates are exactly the
> ones
> >> that do not have parallelism support today. Rather, that's because you
> >> just about always have an interest in the order in which the inputs get
> >> aggregated, which is something that parallel aggregation cannot support.
>
> > This very much reminds me of something that exists in the 8.4 release
> notes:
> >> SELECT DISTINCT and UNION/INTERSECT/EXCEPT no longer always produce
> sorted output (Tom)
>
> That's a completely false analogy: we got a significant performance
> benefit for a significant fraction of users by supporting hashed
> aggregation. My argument here is that only a very tiny fraction of
> string_agg/array_agg users will not care about aggregation order, and thus
> I don't believe that this patch can help very many people. Against that,
> it's likely to hurt other people, by breaking their queries and forcing
> them to insert expensive explicit sorts to fix it. Even discounting the
> backwards-compatibility question, we don't normally adopt performance
> features for which it's unclear that the net gain over all users is
> positive.
>

I think you are quite wrong in claiming that only a tiny fraction of the
users are going to care.

This may, and quite probably does, hold true for string_agg(), but not for
array_agg(). I see a lot of cases where people use that to load it into an
unordered array/hashmap/set/whatever on the client side, which looses
ordering *anyway*,and they would definitely benefit from it.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2018-03-27 07:22:39 Re: Parallel Aggregates for string_agg and array_agg
Previous Message Magnus Hagander 2018-03-27 06:56:10 Re: Online enabling of checksums