Re: Document aggregate functions better w.r.t. ORDER BY

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Document aggregate functions better w.r.t. ORDER BY
Date: 2023-10-26 01:57:10
Message-ID: ZTnHdk833s1UG-Vi@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 25, 2023 at 05:10:17PM -0700, David G. Johnston wrote:
> The paragraph leading into the last added example needs to be tweaked:
>
> If DISTINCT is specified within an aggregate, the data is sorted in ascending
> order while extracting unique values.  You can add an ORDER BY clause, limited
> to expressions matching the regular arguments of the aggregate, to sort the
> output in descending order.
>
> (show existing - DISTINCT only - example here)
>
> <programlisting>
> WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
> SELECT string_agg(DISTINCT v::text, ';' ORDER BY v::text DESC) FROM vals;
>  string_agg
> -----------
>   4;3;2;1
> </programlisting>
>
> (existing note)

I see what you mean. I added an example that doesn't match the existing
paragraph. I have rewritten the paragraph and used a relevant example;
patch attached.

> Question: Do you know whether we for certain always sort ascending here to
> compute the unique values or whether if, say, there is an index on the column
> in descending order (or ascending and traversed backwards) that the data within
> the aggregate could, with an order by, be returned in descending order?  If it
> is ascending, is that part of the SQL Standard (since it doesn't even allow an
> order by to give the user the ability the control the output ordering) or does
> the SQL Standard expect that even a random order would be fine since there are
> algorithms that can be used that do not involve sorting the input?

I don't think order is ever guaranteed in the standard without an ORDER
BY.

> It seems redundant to first say "regular arguments" then negate it in order to
> say "DISTINCT list".  Using the positive form with "DISTINCT list" should get
> the point across sufficiently and succinctly.  It also avoids me feeling like
> there should be an example of what happens when you do "sort on an expression
> that is not included in the DISTINCT list".

Agreed, I rewrote that.

> Interestingly:
>
> WITH vals (v,l) AS ( VALUES (1,'Z'),(3,'D'),(4,'R'),(3,'A'),(2,'T') )
> SELECT string_agg(DISTINCT l, ';' ORDER BY l, ';' DESC) FROM vals;
>
> ERROR:  in an aggregate with DISTINCT, ORDER BY expressions must appear in
> argument list
> LINE 2: SELECT string_agg(DISTINCT l, ';' ORDER BY l, ';' DESC) FROM...
>
> But both expressions in the argument list (el and semicolon) do appear in the
> ORDER BY...

I think ORDER BY has to match DISTINCT columns, while you are using ';'.
I used a simpler example with array_agg() in my patch to avoid the issue.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

Attachment Content-Type Size
agg_order.diff text/x-diff 5.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-10-26 01:59:58 Re: Introduce a new view for checkpointer related stats
Previous Message Daniele Varrazzo 2023-10-26 01:31:33 Re: libpq async connection and multiple hosts