Re: Add proper planner support for ORDER BY / DISTINCT aggregates

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Pavel Luzanov <p(dot)luzanov(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Date: 2023-01-18 10:52:24
Message-ID: CAEZATCVG8SZhxr6JDnPH2ktgDznktcNitZBn9dB7mb20+f4GRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 18 Jan 2023 at 09:49, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Wed, 18 Jan 2023 at 22:37, Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> > I'm still confused that when the same scenario happens with ORDER BY in
> > an aggregate function, like in query 1, the result is different in that
> > we would get an unsorted output.
> >
> > I wonder if we should avoid this inconsistent behavior.
>
> It certainly seems pretty strange that aggregates with an ORDER BY
> behave differently from the query's ORDER BY. I'd have expected that
> to be the same. I've not looked to see why there's a difference, but
> suspect that we thought about how we want it to work for the query's
> ORDER BY and when ORDER BY aggregates were added, that behaviour was
> not considered.
>

I think the behaviour of an ORDER BY in the query can also be pretty
surprising. For example, consider:

SELECT ARRAY[random(), random(), random()]
FROM generate_series(1, 3);

array
-------------------------------------------------------------
{0.2335800863701647,0.14688842754711273,0.2975659224823368}
{0.10616525384762876,0.8371175798972244,0.2936178886154661}
{0.21679841321788262,0.5254761982948826,0.7789412240118161}
(3 rows)

which produces 9 different random values, as expected, and compare that to:

SELECT ARRAY[random(), random(), random()]
FROM generate_series(1, 3)
ORDER BY random();

array
---------------------------------------------------------------
{0.01952216253949679,0.01952216253949679,0.01952216253949679}
{0.6735145595500629,0.6735145595500629,0.6735145595500629}
{0.9406665780147616,0.9406665780147616,0.9406665780147616}
(3 rows)

which now only has 3 distinct random values. It's pretty
counterintuitive that adding an ORDER BY clause changes the contents
of the rows returned, not just their order.

The trouble is, if we tried to fix that, we'd risk changing some other
behaviour that users may have come to rely on.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-01-18 11:04:15 Re: heapgettup refactoring
Previous Message Peter Eisentraut 2023-01-18 10:36:18 Re: [DOCS] Stats views and functions not in order?