Re: Parallel Aggregates for string_agg and array_agg

From: Mark Dilger <hornschnorter(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregates for string_agg and array_agg
Date: 2018-05-01 21:09:39
Message-ID: 1C2959D0-56F2-4067-B2AC-DF9A3B1D0FB5@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On Mar 27, 2018, at 7:58 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
>> On 27 March 2018 at 13:26, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>>> synchronized_seqscans is another piece of precedent in the area, FWIW.
>
>> This is true. I guess the order of aggregation could be made more
>> certain if we remove the cost based optimiser completely, and just
>> rely on a syntax based optimiser.
>
> None of this is responding to my point. I think the number of people
> who actually don't care about aggregation order for these aggregates
> is negligible, and none of you have argued against that; you've instead
> selected straw men to attack.

I frequently care about the order, but only to the extent that the order
is stable between aggregates of several different columns, along the lines
of:

select array_agg(a) AS x, array_agg(b) AS y
from generate_a_b_func(foo);

I don't care which order the data is in, as long as x[i] and y[i] are
matched correctly. It sounds like this patch would force me to write
that as, for example:

select array_agg(a order by a, b) AS x, array_agg(b order by a, b) AS y
from generate_a_b_func(foo);

which I did not need to do before. I would expect a performance regression
from the two newly required sorts. So in that case I agree with Tom.

But I also agree with others that I want the parallel aggregation functionality.
Could we perhaps introduce some option for the aggregate to force it to be
stable? Something like:

select array_agg(a order undisturbed) AS x, array_agg(b order undisturbed) AS y
from generate_a_b_func(foo);

which would not perform an extra sort operation but would guarantee to not
disturb the pre-existing sort order coming from generate_a_b_func(foo)?

I don't care about the syntax / keywords in the example above. I'm just
looking to get the benefits of the parallel aggregation when I don't care
about ordering while preserving the order for these cases where it matters.

mark

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-05-01 21:11:47 Re: Parallel Aggregates for string_agg and array_agg
Previous Message Andres Freund 2018-05-01 21:09:26 Re: Oddity in tuple routing for foreign partitions