Re: ANY_VALUE aggregate

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Vik Fearing <vik(at)postgresfriends(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: ANY_VALUE aggregate
Date: 2022-12-07 03:22:36
Message-ID: CAKFQuwYXaQ9nY-drJ0fd-qDiaKqTfVUpEjxj=56FXxOF6vFogg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 5, 2022 at 10:40 PM Vik Fearing <vik(at)postgresfriends(dot)org> wrote:

> On 12/6/22 05:57, David G. Johnston wrote:
> > On Mon, Dec 5, 2022 at 9:48 PM Vik Fearing <vik(at)postgresfriends(dot)org>
> wrote:
> >
> >> I can imagine an optimization that would remove an ORDER BY clause
> >> because it isn't needed for any other aggregate.
> >
> >
> > I'm referring to the query:
> >
> > select any_value(v order by v) from (values (2),(1),(3)) as vals (v);
> > // produces 1, per the documented implementation-defined behavior.
>
> Implementation-dependent. It is NOT implementation-defined, per spec.
>
> I really don't care all that much about the spec here given that ORDER BY
in an aggregate call is non-spec.

We often loosen the spec rules when they don't make technical sense to
> us, but I don't know of any example of when we have tightened them.
>

The function has to choose some row from among its inputs, and the system
has to obey an order by specification added to the function call. You are
de-facto creating a first_value aggregate (which is by definition
non-standard) whether you like it or not. I'm just saying to be upfront
and honest about it - our users do want such a capability so maybe accept
that there is a first time for everything. Not that picking an
advantageous "implementation-dependent" implementation should be considered
deviating from the spec.

> > Someone writing:
> >
> > select any_value(v) from (values (2),(1),(3)) as vals (v) order by v;
> >
> > Is not presently, nor am I saying, promised the value 1.
> >
> > I'm assuming you are thinking of the second query form, while the
> guarantee
> > only needs to apply to the first.
>
> I am saying that a theoretical pg_aggregate.aggorderdoesnotmatter could
> bestow upon ANY_VALUE the ability to make those two queries equivalent.
>

That theoretical idea should not be entertained. Removing a user's
explicitly added ORDER BY should be off-limits. Any approach at
optimization here should simply look at whether an ORDER BY is specified
and pass that information to the function. If the function itself really
believes that ordering matters it can emit its own runtime exception
stating that fact and the user can fix their query.

> If you care about which value you get back, use something else.
>
>
There isn't a "something else" to use so that isn't presently an option.

I suppose it comes down to what level of belief and care you have that
people will simply mis-use this function if it is added in its current form
to get the desired first_value effect that it produces.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-12-07 03:25:01 Re: [RFC] building postgres with meson - v13
Previous Message Michael Paquier 2022-12-07 03:20:08 Re: Commit fest 2022-11