Re: Document aggregate functions better w.r.t. ORDER BY

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Document aggregate functions better w.r.t. ORDER BY
Date: 2023-10-26 21:56:44
Message-ID: ZTrgnM4GGNM9SniX@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 25, 2023 at 10:34:10PM -0700, David G. Johnston wrote:
> I would reword the existing note to be something like:
>
> The SQL Standard defines specific aggregates and their properties, including
> which of DISTINCT and/or ORDER BY is allowed.  Due to the extensible nature of
> PostgreSQL it accepts either or both clauses for any aggregate.

Uh, is this something in my patch or somewhere else? I don't think
PostgreSQL extensible is an example of syntax flexibility.

> From the most recent patch:
>
>     <para>
> -    If <literal>DISTINCT</literal> is specified in addition to an
> -    <replaceable>order_by_clause</replaceable>, then all the <literal>ORDER BY
> </literal>
> -    expressions must match regular arguments of the aggregate; that is,
> -    you cannot sort on an expression that is not included in the
> -    <literal>DISTINCT</literal> list.
> +    If <literal>DISTINCT</literal> is specified with an
> +    <replaceable>order_by_clause</replaceable>, <literal>ORDER
> +    BY</literal> expressions can only reference columns in the
> +    <literal>DISTINCT</literal> list.  For example:
> +<programlisting>
> +WITH vals (v1, v2) AS ( VALUES (1,'Z'),(3,'D'),(4,'R'),(3,'A'),(2,'T') )
> +SELECT array_agg(DISTINCT v2 ORDER BY v2 DESC) FROM vals;
> +  array_agg
> +-------------
> + {Z,T,R,D,A}
> +</programlisting>
>
> The change to a two-column vals was mostly to try and find corner-cases that
> might need to be addressed.  If we don't intend to show the error case of
> DISTINCT v1 ORDER BY v2 then we should go back to the original example and just
> add ORDER BY v DESC.  I'm fine with not using string_agg here.
>
> +    For example:
> +<programlisting>
> +WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
> +SELECT array_agg(DISTINCT v ORDER BY v DESC) FROM vals;
> + array_agg
> +-----------
> + {4,3,2,1}
> +</programlisting>

Okay, good, switched in the attached patch.

> We get enough complaints regarding "apparent ordering" that I would like to
> add:
>
> As a reminder, while some DISTINCT processing algorithms produce sorted output
> as a side-effect, only by specifying ORDER BY is the output order guaranteed.

Well, we need to create a new email thread for this and look at all the
areas is applies to since this is a much larger issue.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

Attachment Content-Type Size
agg_order.diff text/x-diff 5.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2023-10-26 21:57:57 Re: [PoC] pg_upgrade: allow to upgrade publisher node
Previous Message David Rowley 2023-10-26 21:53:29 Re: Making aggregate deserialization (and WAL receive) functions slightly faster