Re: Parallel Aggregates for string_agg and array_agg

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregates for string_agg and array_agg
Date: 2018-03-27 17:31:46
Message-ID: 05f1e4ea-4c73-3ac8-6dd1-14e0a94aa005@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/27/2018 04:58 PM, Tom Lane wrote:
> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
>> On 27 March 2018 at 13:26, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>>> synchronized_seqscans is another piece of precedent in the area, FWIW.
>
>> This is true. I guess the order of aggregation could be made more
>> certain if we remove the cost based optimiser completely, and just
>> rely on a syntax based optimiser.
>
> None of this is responding to my point. I think the number of people
> who actually don't care about aggregation order for these aggregates
> is negligible, and none of you have argued against that; you've
> instead selected straw men to attack.
>

I don't quite see what the straw men are, TBH. (And David's response was
obviously meant as a tongue-in-cheek.)

Let me try addressing your point, as stated in your earlier message.

On 03/27/2018 12:28 AM, Tom Lane wrote:
> My argument here is that only a very tiny fraction of
> string_agg/array_agg users will not care about aggregation order,
> and thus I don't believe that this patch can help very many people.
> Against that, it's likely to hurt other people, by breaking their
> queries and forcing them to insert expensive explicit sorts to fix
> it.
You're claiming this is a choice between "reliable aggregation order"
vs. "parallel aggregation." I find that dubious.

My argument is that we do not provide any reliable aggregation order,
unless the user supplies ORDER BY in the aggregate, or takes care about
input ordering (essentially doing a subquery with ORDER BY).

Users may rely on getting the "right" plan, but unless they take special
care (disabling some plans etc.) that's just pure luck. Not only is
there plenty of stuff that can trigger plan change (misestimates, data
growth, ...) but we also add new plans that don't produce the same
ordering (e.g. parallel versions of some plans).

And if they already take the extra care, they can disable parallel query
to get the same plan. It's true that's all or nothing, and they might be
benefiting from parallel query in different part of the query, in which
case they'll need to modify the query.

> Even discounting the backwards-compatibility question, we don't
> normally adopt performance features for which it's unclear that the
> net gain over all users is positive.
We only really have anecdotal data, both of us. We can make some
guesses, but that's about it.

I don't know anyone who would be relying on array_agg ordering without
the explicit ORDER BY somewhere (particularly not after major upgrades).
But obviously those issues only surface after the change is made.

I do however know people who are badly affected by array_agg() disabling
parallel aggregate for other aggregates. They don't care about aggregate
order (typically it's filtering + unnest or export).

> Maybe what that says is that rather than giving up on this
> altogether, we should shelve it till we have less stupid
> planner+executor behavior for ORDER BY inside aggregates. That's
> been on the to-do list for a long while, but not risen to the top
> ...
That would be a valid choice if there was a reliable aggregate order.
But AFAIK there isn't, as explained above.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-03-27 17:34:26 Re: JIT compiling with LLVM v12
Previous Message Robert Haas 2018-03-27 17:29:48 Re: [HACKERS] why not parallel seq scan for slow functions