Re: Add proper planner support for ORDER BY / DISTINCT aggregates

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Pavel Luzanov <p(dot)luzanov(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Date: 2023-01-11 05:23:50
Message-ID: CAApHDvov5upudo+03svepBJZ2o7DGtU7X+biVX12ni6ojUDn9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 11 Jan 2023 at 17:32, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> > I think whatever the fix is here, we should likely ensure that the
> > results are consistent regardless of which Aggrefs are the presorted
> > ones. Perhaps the easiest way to do that, and to ensure we call the
> > volatile functions are called the same number of times would just be
> > to never choose Aggrefs with volatile functions when doing
> > make_pathkeys_for_groupagg().
>
> There's existing logic in equivclass.c and other places that tries
> to draw very tight lines around what we'll assume about volatile
> sort expressions (pathkeys). It sounds like there's someplace in
> this recent patch that didn't get that memo.

I'm not sure I did a good job of communicating my thoughts there. What
I mean is, having volatile functions in the aggregate's ORDER BY or
DISTINCT clause didn't seem very well behaved prior to the presorted
aggregates patch. If I go and fix the bug with the missing targetlist
items, then a query such as:

select string_agg(random()::text, ',' order by random()) from
generate_series(1,3);

should start putting the random() numbers in order where it didn't
prior to 1349d279. Perhaps users might be happy that those are in
order, however, if they then go and change the query to:

select sum(a order by a),string_agg(random()::text, ',' order by
random()) from generate_series(1,3);

then they might become unhappy again that their string_agg is not
ordered the way they specified because the planner opted to sort by
"a" rather than "random()" after the initial scan.

I'm wondering if 1349d279 should have just never opted to presort
Aggrefs which have volatile functions so that the existing behaviour
of unordered output is given always and nobody is fooled into thinking
this works correctly only to be disappointed later when they add some
other aggregate to their query or if we should fix both. Certainly,
it seems much easier to do the former.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-01-11 05:27:09 Re: [PATCH] Support using "all" for the db user in pg_ident.conf
Previous Message Michael Paquier 2023-01-11 04:37:51 Re: Add a new pg_walinspect function to extract FPIs from WAL records