Re: [patch] bit XOR aggregate functions

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Vik Fearing <vik(at)postgresfriends(dot)org>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "bashtanov(at)imap(dot)cc" <bashtanov(at)imap(dot)cc>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [patch] bit XOR aggregate functions
Date: 2021-03-07 12:03:49
Message-ID: CAFj8pRB_yjdpt-NaJJoAWirPcdTPVETKxkJk-G1AVHt-sL+fjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

ne 7. 3. 2021 v 12:39 odesílatel Vik Fearing <vik(at)postgresfriends(dot)org>
napsal:

> On 3/7/21 11:37 AM, Pavel Stehule wrote:
> > ne 7. 3. 2021 v 11:28 odesílatel Vik Fearing <vik(at)postgresfriends(dot)org>
> > napsal:
> >
> >> On 3/7/21 11:24 AM, Pavel Stehule wrote:
> >>>>
> >>>> And so you are now mandating an ORDER BY on every query and in every
> >>>> aggregate and/or window function. Users will not like that at all. I
> >>>> certainly shan't.
> >>>>
> >>>
> >>> The mandatory ORDER BY clause should be necessary for operations when
> the
> >>> result depends on the order. You need an order for calculation of
> median.
> >>> And you don't need to know an order for average. More if the result is
> >> one
> >>> number and is not possible to do a visual check of correctness (like
> >>> median).
> >>
> >> The syntax for median (percentile_cont(0.5)) already requires an order
> >> by clause. You are now requiring one on array_agg().
> >>
> >
> > array_agg is discuttable, because PostgreSQL arrays are ordered set type.
> > But very common usage is using arrays instead and unordered sets (because
> > ANSI/SQL sets) are not supported. But anyway - for arrays I can do visual
> > check if it is ordered well or not.
>
> If by "visual check" you mean "with my human eyeballs" then I would
> argue that that is always the case and we don't need nannying for other
> aggregates either.
>

The correct solution is using arrays like arrays and sets like sets. When
you mix two different features to one, then you will have problems.

But if I see {{1,2,3},{3,4,5}} I have some knowledge - it is not 100%, but
it is. If I have 27373 as a result of median, I have nothing other
information.

The design of arrays (in pg) was incremental - it is older than Postgres
supported ordered aggregates, and probably older than ANSI/SQL introduced
sets. So the implementation of strong safeguards is not possible for
compatibility reasons. If I designed array_agg or string_agg today, then I
prefer to design it like ordered aggregates.

Sure - it is about life philosophy, and it is about projects where you are,
and about risks, .. some people prefer risks, some people prefer
safeguards. I see a complexity boom as a very big issue - I remember good
books about programming on 50 pagers, and then now we should start from
green or zero again or we have to implement most safeguards that are
possible to hold systems workable. But anyway - a good system is robust,
and robust systems try to reduce possible errors how it is possible (human
errors are most common).

But this is offtopic in this discussion :)

--
> Vik Fearing
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Japin Li 2021-03-07 12:07:15 Re: EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW
Previous Message Bharath Rupireddy 2021-03-07 11:43:42 Re: Support ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION ... syntax