Re: Parallel Aggregates for string_agg and array_agg

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregates for string_agg and array_agg
Date: 2018-03-26 21:54:32
Message-ID: CAKJS1f_n_Z+cGjG5SGeKxhCpRmpEifcoTgEqXkH2sra_ze8KJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 27 March 2018 at 10:20, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> In the end, I do tend to agree that we probably should add parallel
> support to these aggregates, but it'd also be nice to hear from those
> who had worked to add parallelism to the various aggregates as to if
> there was some reason these were skipped.

That was me. I originally skipped string_agg and array_agg as I
thought parallelism would not be much use due to the growth of the
size of the state. I thought the combine function could become very
expensive as it has to work on each aggregated value, instead of a
consolidated result, as the other aggregates provide. The reason for
adding this now and not back then is:

1. I was wrong, parallel aggregation can improve performance for these
aggregates, and;
2. It's annoying that a single non-partial aggregate in the query
disabled parallel aggregation, and;
3. When I originally considered the performance I didn't consider that
some queries may filter values before aggregation via a FILTER or a
WHERE clause.
4. It's not just parallel aggregation that uses partial aggregation,
we can now push aggregation below Append for partitioned tables in
some cases. Hopefully, we'll also have "group before join" one day
too.

While 2 is still true, there should be a reduction in the times this
happens. Just the json and xml aggs holding us back from the standard
set now.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-03-26 22:03:23 Re: [HACKERS] Partition-wise aggregation/grouping
Previous Message Robert Haas 2018-03-26 21:38:44 Re: [HACKERS] why not parallel seq scan for slow functions