Re: ANY_VALUE aggregate

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Vik Fearing <vik(at)postgresfriends(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: ANY_VALUE aggregate
Date: 2022-12-08 05:48:16
Message-ID: CAKFQuwYK_UeLFUrfzJxsdmzMv-0uYQO8v3X96jYp=vkzRzwvmg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 7, 2022 at 10:00 PM Vik Fearing <vik(at)postgresfriends(dot)org> wrote:

> On 12/7/22 04:22, David G. Johnston wrote:
> > On Mon, Dec 5, 2022 at 10:40 PM Vik Fearing <vik(at)postgresfriends(dot)org>
> wrote:
> >
> >> On 12/6/22 05:57, David G. Johnston wrote:
> >>> On Mon, Dec 5, 2022 at 9:48 PM Vik Fearing <vik(at)postgresfriends(dot)org>
> >> wrote:
> >>>
> >>>> I can imagine an optimization that would remove an ORDER BY clause
> >>>> because it isn't needed for any other aggregate.
> >>>
> >>>
> >>> I'm referring to the query:
> >>>
> >>> select any_value(v order by v) from (values (2),(1),(3)) as vals (v);
> >>> // produces 1, per the documented implementation-defined behavior.
> >>
> >> Implementation-dependent. It is NOT implementation-defined, per spec.
> >
> > I really don't care all that much about the spec here given that ORDER BY
> > in an aggregate call is non-spec.
>
>
> Well, this is demonstrably wrong.
>
> <array aggregate function> ::=
> ARRAY_AGG <left paren>
> <value expression>
> [ ORDER BY <sort specification list> ]
> <right paren>
>

Demoable only by you and a few others...

We should update our documentation - the source of SQL Standard knowledge
for mere mortals.

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES

"Note: The ability to specify both DISTINCT and ORDER BY in an aggregate
function is a PostgreSQL extension."

Apparently only DISTINCT remains as our extension.

>
> > You are de-facto creating a first_value aggregate (which is by definition
> > non-standard) whether you like it or not.
>
>
> I am de jure creating an any_value aggregate (which is by definition
> standard) whether you like it or not.
>

Yes, both statements seem true. At least until we decide to start ignoring
a user's explicit order by clause.

>
> >> If you care about which value you get back, use something else.
> >
> > There isn't a "something else" to use so that isn't presently an option.
>
>
> The query
>
> SELECT proposed_first_value(x ORDER BY y) FROM ...
>
> is equivalent to
>
> SELECT (ARRAY_AGG(x ORDER BY y))[1] FROM ...
>
> so I am not very sympathetic to your claim of "no other option".
>

Semantically, yes, in terms of performance, not so much, for any
non-trivial sized group.

I'm done, and apologize for getting too emotionally invested in this. I
hope to get others to voice enough +1s to get a first_value function into
core along-side this one (which makes the above discussion either moot or
deferred until there is a concrete use case for ignoring an explicit ORDER
BY). If that doesn't happen, well, it isn't going to make or break us
either way.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2022-12-08 05:56:38 Re: Collation version tracking for macOS
Previous Message Peter Smith 2022-12-08 05:20:04 Re: PGDOCS - Logical replication GUCs - added some xrefs