Re: Parallel Aggregates for string_agg and array_agg

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregates for string_agg and array_agg
Date: 2018-03-26 21:19:54
Message-ID: 18594.1522099194@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> On 03/26/2018 10:27 PM, Tom Lane wrote:
>> I fear that what will happen, if we commit this, is that something like
>> 0.01% of the users of array_agg and string_agg will be pleased, another
>> maybe 20% will be unaffected because they wrote ORDER BY which prevents
>> parallel aggregation, and the remaining 80% will scream because we broke
>> their queries. Telling them they should've written ORDER BY isn't going
>> to cut it, IMO, when the benefit of that breakage will accrue only to some
>> very tiny fraction of use-cases.

> Isn't the ordering unreliable *already*?

Not if the query is such that what gets chosen is, say, an indexscan or
mergejoin. It might be theoretically unreliable and yet work fine for
a given application.

I might be too pessimistic about the fraction of users who are depending
on ordered input without having written anything that explicitly forces
that ... but I stand by the theory that it substantially exceeds the
fraction of users who could get any benefit.

Your own example of assuming that separate aggregates are computed
in the same order reinforces my point, I think. In principle, anybody
who's doing that should write

array_agg(e order by x),
array_agg(f order by x),
string_agg(g order by x)

because otherwise they shouldn't assume that; the manual certainly doesn't
promise it. But nobody does that in production, because if they did
they'd get killed by the fact that the sorts are all done independently.
(We should improve that someday, but it hasn't been done yet.) So I think
there are an awful lot of people out there who are assuming more than a
lawyerly reading of the manual would allow. Their reaction to this will
be about like ours every time the GCC guys decide that some longstanding
behavior of C code isn't actually promised by the text of the C standard.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2018-03-26 21:20:06 Re: Parallel Aggregates for string_agg and array_agg
Previous Message Alvaro Hernandez 2018-03-26 21:18:45 Re: Proposal: http2 wire format