From: | Mark Dilger <hornschnorter(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel Aggregates for string_agg and array_agg |
Date: | 2018-05-01 21:09:39 |
Message-ID: | 1C2959D0-56F2-4067-B2AC-DF9A3B1D0FB5@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On Mar 27, 2018, at 7:58 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
>> On 27 March 2018 at 13:26, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>>> synchronized_seqscans is another piece of precedent in the area, FWIW.
>
>> This is true. I guess the order of aggregation could be made more
>> certain if we remove the cost based optimiser completely, and just
>> rely on a syntax based optimiser.
>
> None of this is responding to my point. I think the number of people
> who actually don't care about aggregation order for these aggregates
> is negligible, and none of you have argued against that; you've instead
> selected straw men to attack.
I frequently care about the order, but only to the extent that the order
is stable between aggregates of several different columns, along the lines
of:
select array_agg(a) AS x, array_agg(b) AS y
from generate_a_b_func(foo);
I don't care which order the data is in, as long as x[i] and y[i] are
matched correctly. It sounds like this patch would force me to write
that as, for example:
select array_agg(a order by a, b) AS x, array_agg(b order by a, b) AS y
from generate_a_b_func(foo);
which I did not need to do before. I would expect a performance regression
from the two newly required sorts. So in that case I agree with Tom.
But I also agree with others that I want the parallel aggregation functionality.
Could we perhaps introduce some option for the aggregate to force it to be
stable? Something like:
select array_agg(a order undisturbed) AS x, array_agg(b order undisturbed) AS y
from generate_a_b_func(foo);
which would not perform an extra sort operation but would guarantee to not
disturb the pre-existing sort order coming from generate_a_b_func(foo)?
I don't care about the syntax / keywords in the example above. I'm just
looking to get the benefits of the parallel aggregation when I don't care
about ordering while preserving the order for these cases where it matters.
mark
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2018-05-01 21:11:47 | Re: Parallel Aggregates for string_agg and array_agg |
Previous Message | Andres Freund | 2018-05-01 21:09:26 | Re: Oddity in tuple routing for foreign partitions |