Quick Links

Re: ANY_VALUE aggregate

From:	"David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To:	Vik Fearing <vik(at)postgresfriends(dot)org>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: ANY_VALUE aggregate
Date:	2022-12-08 05:48:16
Message-ID:	CAKFQuwYK_UeLFUrfzJxsdmzMv-0uYQO8v3X96jYp=vkzRzwvmg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Dec 7, 2022 at 10:00 PM Vik Fearing <vik(at)postgresfriends(dot)org> wrote:

> On 12/7/22 04:22, David G. Johnston wrote:
> > On Mon, Dec 5, 2022 at 10:40 PM Vik Fearing <vik(at)postgresfriends(dot)org>
> wrote:
> >
> >> On 12/6/22 05:57, David G. Johnston wrote:
> >>> On Mon, Dec 5, 2022 at 9:48 PM Vik Fearing <vik(at)postgresfriends(dot)org>
> >> wrote:
> >>>
> >>>> I can imagine an optimization that would remove an ORDER BY clause
> >>>> because it isn't needed for any other aggregate.
> >>>
> >>>
> >>> I'm referring to the query:
> >>>
> >>> select any_value(v order by v) from (values (2),(1),(3)) as vals (v);
> >>> // produces 1, per the documented implementation-defined behavior.
> >>
> >> Implementation-dependent. It is NOT implementation-defined, per spec.
> >
> > I really don't care all that much about the spec here given that ORDER BY
> > in an aggregate call is non-spec.
>
>
> Well, this is demonstrably wrong.
>
> <array aggregate function> ::=
> ARRAY_AGG <left paren>
> <value expression>
> [ ORDER BY <sort specification list> ]
> <right paren>
>

Demoable only by you and a few others...

We should update our documentation - the source of SQL Standard knowledge
for mere mortals.

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES

"Note: The ability to specify both DISTINCT and ORDER BY in an aggregate
function is a PostgreSQL extension."

Apparently only DISTINCT remains as our extension.

>
> > You are de-facto creating a first_value aggregate (which is by definition
> > non-standard) whether you like it or not.
>
>
> I am de jure creating an any_value aggregate (which is by definition
> standard) whether you like it or not.
>

Yes, both statements seem true. At least until we decide to start ignoring
a user's explicit order by clause.

>
> >> If you care about which value you get back, use something else.
> >
> > There isn't a "something else" to use so that isn't presently an option.
>
>
> The query
>
> SELECT proposed_first_value(x ORDER BY y) FROM ...
>
> is equivalent to
>
> SELECT (ARRAY_AGG(x ORDER BY y))[1] FROM ...
>
> so I am not very sympathetic to your claim of "no other option".
>

Semantically, yes, in terms of performance, not so much, for any
non-trivial sized group.

I'm done, and apologize for getting too emotionally invested in this. I
hope to get others to voice enough +1s to get a first_value function into
core along-side this one (which makes the above discussion either moot or
deferred until there is a concrete use case for ignoring an explicit ORDER
BY). If that doesn't happen, well, it isn't going to make or break us
either way.

David J.

In response to

Re: ANY_VALUE aggregate at 2022-12-08 05:00:54 from Vik Fearing

Responses

Re: ANY_VALUE aggregate at 2022-12-08 12:32:30 from Vik Fearing

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jeff Davis	2022-12-08 05:56:38	Re: Collation version tracking for macOS
Previous Message	Peter Smith	2022-12-08 05:20:04	Re: PGDOCS - Logical replication GUCs - added some xrefs