Re: ANY_VALUE aggregate

From: Vik Fearing <vik(at)postgresfriends(dot)org>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: ANY_VALUE aggregate
Date: 2022-12-08 05:00:54
Message-ID: e66347b4-670f-eed6-8809-efd9184e0dd1@postgresfriends.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/7/22 04:22, David G. Johnston wrote:
> On Mon, Dec 5, 2022 at 10:40 PM Vik Fearing <vik(at)postgresfriends(dot)org> wrote:
>
>> On 12/6/22 05:57, David G. Johnston wrote:
>>> On Mon, Dec 5, 2022 at 9:48 PM Vik Fearing <vik(at)postgresfriends(dot)org>
>> wrote:
>>>
>>>> I can imagine an optimization that would remove an ORDER BY clause
>>>> because it isn't needed for any other aggregate.
>>>
>>>
>>> I'm referring to the query:
>>>
>>> select any_value(v order by v) from (values (2),(1),(3)) as vals (v);
>>> // produces 1, per the documented implementation-defined behavior.
>>
>> Implementation-dependent. It is NOT implementation-defined, per spec.
>
> I really don't care all that much about the spec here given that ORDER BY
> in an aggregate call is non-spec.

Well, this is demonstrably wrong.

<array aggregate function> ::=
ARRAY_AGG <left paren>
<value expression>
[ ORDER BY <sort specification list> ]
<right paren>

>> We often loosen the spec rules when they don't make technical sense to
>> us, but I don't know of any example of when we have tightened them.
>
> The function has to choose some row from among its inputs,

True.

> and the system has to obey an order by specification added to the function call.

False.

> You are de-facto creating a first_value aggregate (which is by definition
> non-standard) whether you like it or not.

I am de jure creating an any_value aggregate (which is by definition
standard) whether you like it or not.

> I'm just saying to be upfront
> and honest about it - our users do want such a capability so maybe accept
> that there is a first time for everything. Not that picking an
> advantageous "implementation-dependent" implementation should be considered
> deviating from the spec.
>
>
>>> Someone writing:
>>>
>>> select any_value(v) from (values (2),(1),(3)) as vals (v) order by v;
>>>
>>> Is not presently, nor am I saying, promised the value 1.
>>>
>>> I'm assuming you are thinking of the second query form, while the
>> guarantee
>>> only needs to apply to the first.
>>
>> I am saying that a theoretical pg_aggregate.aggorderdoesnotmatter could
>> bestow upon ANY_VALUE the ability to make those two queries equivalent.
>>
>
> That theoretical idea should not be entertained. Removing a user's
> explicitly added ORDER BY should be off-limits. Any approach at
> optimization here should simply look at whether an ORDER BY is specified
> and pass that information to the function. If the function itself really
> believes that ordering matters it can emit its own runtime exception
> stating that fact and the user can fix their query.

It absolutely should be entertained, and I plan on doing so in an
upcoming thread. Whether it errors or ignores is something that should
be discussed on that thread.

>> If you care about which value you get back, use something else.
>
> There isn't a "something else" to use so that isn't presently an option.

The query

SELECT proposed_first_value(x ORDER BY y) FROM ...

is equivalent to

SELECT (ARRAY_AGG(x ORDER BY y))[1] FROM ...

so I am not very sympathetic to your claim of "no other option".

> I suppose it comes down to what level of belief and care you have that
> people will simply mis-use this function if it is added in its current form
> to get the desired first_value effect that it produces.

People who rely on explicitly undefined behavior get what they deserve
when the implementation changes.
--
Vik Fearing

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2022-12-08 05:07:15 Re: Support logical replication of DDLs
Previous Message Amit Kapila 2022-12-08 04:53:40 Re: Avoid streaming the transaction which are skipped (in corner cases)