Ordering behavior for aggregates

From: Vik Fearing <vik(at)postgresfriends(dot)org>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Subject: Ordering behavior for aggregates
Date: 2022-12-13 12:50:48
Message-ID: 1fd5ddb0-9603-8f0d-9fab-2ff45d75fc3d@postgresfriends.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The standard only defines an ORDER BY clause inside of an aggregate for
ARRAY_AGG(). As an extension to the standard, we allow it for all
aggregates, which is very convenient for non-standard things like
string_agg().

However, it is completely useless for things like AVG() or SUM(). If
you include it, the aggregate will do the sort even though it is neither
required nor desired.

I am proposing something like pg_aggregate.aggordering which would be an
enum of behaviors such as f=Forbidden, a=Allowed, r=Required. Currently
all aggregates would have 'a' but I am thinking that a lot of them could
be switched to 'f'. In that case, if a user supplies an ordering, an
error is raised.

My main motivation behind this is to be able to optimize aggregates that
could stop early such as ANY_VALUE(), but also to self-optimize queries
written in error (or ignorance).

There is recurring demand for a first_agg() of some sort, and that one
(whether implemented in core or left to extensions) would use 'r' so
that an error is raised if the user does not supply an ordering.

I have not started working on this because I envision quite a lot of
bikeshedding, but this is the approach I am aiming for.

Thoughts?
--
Vik Fearing

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2022-12-13 12:55:27 Re: Ordering behavior for aggregates
Previous Message Amul Sul 2022-12-13 12:33:19 Re: Error-safe user functions