Re: Parallel Aggregates for string_agg and array_agg

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregates for string_agg and array_agg
Date: 2018-05-01 22:14:25
Message-ID: CAKJS1f-3fs_u235GKt0fC7D+9vJ9ApB-c1tDVTB_U0HUdGxw-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2 May 2018 at 08:59, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Mar 26, 2018 at 4:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I fear that what will happen, if we commit this, is that something like
>>> 0.01% of the users of array_agg and string_agg will be pleased, another
>>> maybe 20% will be unaffected because they wrote ORDER BY which prevents
>>> parallel aggregation, and the remaining 80% will scream because we broke
>>> their queries. Telling them they should've written ORDER BY isn't going
>>> to cut it, IMO, when the benefit of that breakage will accrue only to some
>>> very tiny fraction of use-cases.
>
>> I think your estimated percentages here are wildly inaccurate.
>
> My estimate for the number of people positively impacted could be off
> by a factor of a thousand, and it still wouldn't change the conclusion
> that this will hurt more people than it helps.

It's probably best to have this argument again in 6 months or so.
Someone might decide to put some effort into teaching the planner
about ordered aggregates so it can choose to provide pre-sorted input
to aggregate functions so that nodeAgg.c no longer has to perform the
costly explicit sorts. Windowing functions already do this for the
first window clause, so it might not be too hard to do it for the
first aggregate, and any subsequent aggregates which share the same
ORDER BY / DISTINCT clause. We'd probably need to add some sort of
evaluation order to the Aggref, but ... details need not be discussed
here ...

I think if we had that, then the current objection to the patch loses
quite a bit of traction.

I imagine Tom will give up his objection given that he wrote this:

On 27 March 2018 at 11:28, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Maybe what that says is that rather than giving up on this altogether,
> we should shelve it till we have less stupid planner+executor behavior
> for ORDER BY inside aggregates. That's been on the to-do list for
> a long while, but not risen to the top ...

So it seems there's probably room here to please everyone.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-05-01 22:29:17 Re: A few warnings on Windows
Previous Message Andres Freund 2018-05-01 21:38:32 Re: Parallel Aggregates for string_agg and array_agg