Re: Parallel Aggregates for string_agg and array_agg

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: Parallel Aggregates for string_agg and array_agg
Date: 2022-12-27 07:48:11
Message-ID: CAFj8pRD12EyhXPZGs_q3D0HfzjxvXD4i9Wo5kGCj-3z41tmp7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

út 27. 12. 2022 v 6:26 odesílatel David Rowley <dgrowleyml(at)gmail(dot)com>
napsal:

> On Wed, 20 Jun 2018 at 19:08, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
> wrote:
> > I'll submit it again when there more consensus that we want this.
>
> Waking up this old thread again. If you don't have a copy, the entire
> thread is in [1].
>
> The remaining item that seemed to cause this patch to be rejected was
> raised in [2]. The summary of that was that it might not be a good
> idea to allow parallel aggregation of string_agg() and array_agg() as
> there might be some people who rely on the current ordering they get
> without having an ORDER BY clause in the aggregate function call. Tom
> mentioned in [3] that users might not want to add an ORDER BY to their
> aggregate function because the performance of it is terrible. That
> was true up until 1349d2790 [4], where I changed how ORDER BY /
> DISTINCT aggregation worked to allow the planner to provide pre-sorted
> input rather than always having nodeAgg.c do the sorting. I think
> this removes quite a lot of the argument against the patch, but not
> all of it. So here goes testing the water on seeing if any opinions
> have changed over the past few years?
>
> A rebased patch is attached.
>

The technical part of this patch was discussed before, and there were no
objections against it.

I did some performance benchmarks and I agree with David. Sure - it can
change the ordering, but without the ORDER BY clause, the order depended
before too. And the performance with ORDER clause is not changed with and
without this patch.

I can confirm that all regress tests passed, and there are not any problems
with compilation.

I'll mark this patch as ready for committer

Regards

Pavel

>
> David
>
> [1]
> https://www.postgresql.org/message-id/flat/CAKJS1f98yPkRMsE0JnDh72%3DAQEUuE3atiCJtPVCtjhFwzCRJHQ%40mail.gmail.com#8bbce15b9279d2da2da99071f732a99d
> [2] https://www.postgresql.org/message-id/6538.1522096067@sss.pgh.pa.us
> [3] https://www.postgresql.org/message-id/18594.1522099194@sss.pgh.pa.us
> [4]
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1349d2790bf48a4de072931c722f39337e72055e
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2022-12-27 08:14:13 RE: Exit walsender before confirming remote flush in logical replication
Previous Message Hayato Kuroda (Fujitsu) 2022-12-27 07:47:52 RE: New strategies for freezing, advancing relfrozenxid early