Re: Compatible defaults for LEAD/LAG

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, Vik Fearing <vik(at)postgresfriends(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Compatible defaults for LEAD/LAG
Date: 2020-08-31 05:05:41
Message-ID: CAFj8pRAFhk_Ogoj7nmotAH-MTEJap35cGRZT9AShXBKUVumLig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

ne 30. 8. 2020 v 23:59 odesílatel Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> napsal:

> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> > This is nice example of usage of anycompatible type (that is consistent
> > with other things in Postgres), but standard says something else.
> > It can be easily solved with https://commitfest.postgresql.org/28/2081/
> -
> > but Tom doesn't like this patch.
> > I am more inclined to think so this feature should be implemented
> > differently - there is no strong reason to go against standard or against
> > the implementations of other databases (and increase the costs of
> porting).
> > Now the implementation is limited, but allowed behaviour is 100% ANSI.
>
> I don't particularly buy this argument. The case at hand is what to do
> if we have, say,
>
> select lag(integer_column, 1, 1.2) over ...
>
> The proposed patch would result in the output being of type numeric,
> and any rows using the default would show "1.2". The spec says that
> the right thing is to return integer, and we should round the default
> to "1" to make that work. But
>
> (1) I doubt that anybody actually writes such things;
>
> (2) For anyone who does write it, the spec's behavior fails to meet
> the principle of least surprise. It is not normally the case that
> any information-losing cast would be applied silently within an
> expression.
>

postgres=# create table foo(a int);
CREATE TABLE
postgres=# insert into foo values(1.1);
INSERT 0 1

postgres=# create table foo(a int default 1.1);
CREATE TABLE
postgres=# insert into foo values(default);
INSERT 0 1
postgres=# select * from foo;
┌───┐
│ a │
╞═══╡
│ 1 │
└───┘
(1 row)

It is maybe strange, but it is not an unusual pattern in SQL. I think it
can be analogy with default clause

DECLARE a int DEFAULT 1.2;

The default value doesn't change a type of variable. This is maybe too
stupid example. There can be other little bit more realistic

CREATE OR REPLACE FUNCTION foo(a numeric, b numeric, ...
DECLARE x int DEFAULT a;
BEGIN
...

I am afraid about performance - if default value can change type, then some
other things can stop work - like using index.

For *this* case we don't speak about some operations between two
independent operands or function arguments. We are speaking about
the value and about a *default* for the value.

> So this deviation from spec doesn't bother me; we have much bigger ones.
>

ok, if it is acceptable for other people, I can accept it too - I
understand well so it is a corner case and there is some consistency with
other Postgres features.

Maybe this difference should be mentioned in documentation.

> My concern with this patch is what I said upthread: I'm not sure that
> we should be adjusting this behavior in a piecemeal fashion. I looked
> through pg_proc to find all the functions that have more than one any*
> argument, and found these:
>
> oid | prorettype
> -----------------------------------------------+------------
> lag(anyelement,integer,anyelement) | anyelement
> lead(anyelement,integer,anyelement) | anyelement
> width_bucket(anyelement,anyarray) | integer
> btarraycmp(anyarray,anyarray) | integer
> array_eq(anyarray,anyarray) | boolean
> array_ne(anyarray,anyarray) | boolean
> array_lt(anyarray,anyarray) | boolean
> array_gt(anyarray,anyarray) | boolean
> array_le(anyarray,anyarray) | boolean
> array_ge(anyarray,anyarray) | boolean
> array_append(anyarray,anyelement) | anyarray
> array_prepend(anyelement,anyarray) | anyarray
> array_cat(anyarray,anyarray) | anyarray
> array_larger(anyarray,anyarray) | anyarray
> array_smaller(anyarray,anyarray) | anyarray
> array_position(anyarray,anyelement) | integer
> array_position(anyarray,anyelement,integer) | integer
> array_positions(anyarray,anyelement) | integer[]
> array_remove(anyarray,anyelement) | anyarray
> array_replace(anyarray,anyelement,anyelement) | anyarray
> arrayoverlap(anyarray,anyarray) | boolean
> arraycontains(anyarray,anyarray) | boolean
> arraycontained(anyarray,anyarray) | boolean
> elem_contained_by_range(anyelement,anyrange) | boolean
> range_contains_elem(anyrange,anyelement) | boolean
> range_eq(anyrange,anyrange) | boolean
> range_ne(anyrange,anyrange) | boolean
> range_overlaps(anyrange,anyrange) | boolean
> range_contains(anyrange,anyrange) | boolean
> range_contained_by(anyrange,anyrange) | boolean
> range_adjacent(anyrange,anyrange) | boolean
> range_before(anyrange,anyrange) | boolean
> range_after(anyrange,anyrange) | boolean
> range_overleft(anyrange,anyrange) | boolean
> range_overright(anyrange,anyrange) | boolean
> range_union(anyrange,anyrange) | anyrange
> range_merge(anyrange,anyrange) | anyrange
> range_intersect(anyrange,anyrange) | anyrange
> range_minus(anyrange,anyrange) | anyrange
> range_cmp(anyrange,anyrange) | integer
> range_lt(anyrange,anyrange) | boolean
> range_le(anyrange,anyrange) | boolean
> range_ge(anyrange,anyrange) | boolean
> range_gt(anyrange,anyrange) | boolean
> range_gist_same(anyrange,anyrange,internal) | internal
> (45 rows)
>
> Now, there's no point in changing range_eq and the later entries in this
> table (the ones taking two anyrange's); our rather lame definition of
> anycompatiblerange means that we'd get no benefit from doing so. But
> I think there's a strong case for changing everything before range_eq.
> It'd be nice if something like
>
> SELECT array[1] < array[1.1];
>
> would work the same way that "SELECT 1 < 1.1" would.
>

There it has sense without any discussion. But it is a little bit different
than using the default value in the LAG function.

> I checked the other concern that I had about whether the more flexible
> polymorphic definitions would create any new ambiguities, and I don't
> see any problems with this list. As functions, none of these names are
> overloaded, except with different numbers of arguments so there's no
> ambiguity. Most of the array functions are also operators, but the
> competing operators do not take arrays, so it doesn't look like there's
> any issue on that side either.
>
> So I think we should be more ambitious and generalize all of these
> to use anycompatiblearray etc.
>

+1

Pavel

>
> regards, tom lane
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rahila 2020-08-31 05:06:13 Re: More tests with USING INDEX replident and dropped indexes
Previous Message Justin Pryzby 2020-08-31 05:00:47 v13: show extended stats target in \d