Re: Document aggregate functions better w.r.t. ORDER BY

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Document aggregate functions better w.r.t. ORDER BY
Date: 2023-10-25 01:45:48
Message-ID: CAKFQuwZCZ5P09wBJGSDEQiGZooa21bJjFz1FEjvCs1_hHaB-Ow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 24, 2023 at 1:39 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> On Tue, Dec 13, 2022 at 07:38:15PM -0700, David G. Johnston wrote:
> > All,
> >
> > The recent discussion surrounding aggregates and ORDER BY moved me to
> look over
> > our existing documentation, especially now that we've reworked the
> function
> > tables, to see what improvements can be had by simply documenting those
> > functions where ORDER BY may change the user-visible output. I skipped
> range
> > aggregates for the moment but handled the others on the aggregates page
> (not
> > window functions). This includes the float types for sum and avg.
> >
> > I added a note just before the table linking back to the syntax chapter
> and
> > describing the newly added rules and syntax choice in the table.
> >
> > The nuances of floating point math suggest to me that specifying order
> by for
> > those is in some kind of gray area and so I've marked it optional...any
> > suggestions for wording (or an xref) to explain those nuances or should
> it just
> > be shown non-optional like the others? Or not shown at all?
> >
> > The novelty of my examples is up for bikeshedding. I didn't want
> anything too
> > long so a subquery didn't make sense, and I was trying to avoid
> duplication as
> > well as multiple lines - hence creating a CTE that can be copied onto
> all of
> > the example queries to produce the noted result.
> >
> > I added a DISTINCT example to array_agg because it is the first
> aggregate on
> > the page and so hopefully will be seen during a cursory reading. Plus,
> > array_agg is the go-to function for doing this kind of experimentation.
>
> I like this idea, though the examples seemed too detailed so I skipped
> them. Here is the trimmed-down patch I would like to apply.
>
>
I'd prefer to keep pointing out that the ones documented are those whose
outputs will vary due to ordering.

I've been sympathetic to the user comments that we don't have enough
examples. Just using array_agg for that purpose, showing both DISTINCT and
ORDER BY seems like a fair compromise (removes two from my original
proposal). The examples in the section we tell them to go see aren't of
that great quality. If you strongly dislike having the function table
contain the examples we should at least improve the page we are sending
them to. (As an aside to this, I've personally always found the syntax
block with the 5 syntaxes shown there to be intimidating/hard-to-read).

I'd at least suggest you reconsider the commentary and examples surrounding
jsonb_object_agg.

The same goes for the special knowledge of floating point behavior for why
we've chosen to document avg/sum, something that typically doesn't care
about order, as having an optional order by.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-10-25 01:54:01 Re: Improve WALRead() to suck data directly from WAL buffers when possible
Previous Message Nathan Bossart 2023-10-25 01:45:39 Re: CRC32C Parallel Computation Optimization on ARM