Re: Compatible defaults for LEAD/LAG

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, Vik Fearing <vik(at)postgresfriends(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Compatible defaults for LEAD/LAG
Date: 2020-08-30 21:59:39
Message-ID: 3902163.1598824779@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> This is nice example of usage of anycompatible type (that is consistent
> with other things in Postgres), but standard says something else.
> It can be easily solved with https://commitfest.postgresql.org/28/2081/ -
> but Tom doesn't like this patch.
> I am more inclined to think so this feature should be implemented
> differently - there is no strong reason to go against standard or against
> the implementations of other databases (and increase the costs of porting).
> Now the implementation is limited, but allowed behaviour is 100% ANSI.

I don't particularly buy this argument. The case at hand is what to do
if we have, say,

select lag(integer_column, 1, 1.2) over ...

The proposed patch would result in the output being of type numeric,
and any rows using the default would show "1.2". The spec says that
the right thing is to return integer, and we should round the default
to "1" to make that work. But

(1) I doubt that anybody actually writes such things;

(2) For anyone who does write it, the spec's behavior fails to meet
the principle of least surprise. It is not normally the case that
any information-losing cast would be applied silently within an
expression.

So this deviation from spec doesn't bother me; we have much bigger ones.

My concern with this patch is what I said upthread: I'm not sure that
we should be adjusting this behavior in a piecemeal fashion. I looked
through pg_proc to find all the functions that have more than one any*
argument, and found these:

oid | prorettype
-----------------------------------------------+------------
lag(anyelement,integer,anyelement) | anyelement
lead(anyelement,integer,anyelement) | anyelement
width_bucket(anyelement,anyarray) | integer
btarraycmp(anyarray,anyarray) | integer
array_eq(anyarray,anyarray) | boolean
array_ne(anyarray,anyarray) | boolean
array_lt(anyarray,anyarray) | boolean
array_gt(anyarray,anyarray) | boolean
array_le(anyarray,anyarray) | boolean
array_ge(anyarray,anyarray) | boolean
array_append(anyarray,anyelement) | anyarray
array_prepend(anyelement,anyarray) | anyarray
array_cat(anyarray,anyarray) | anyarray
array_larger(anyarray,anyarray) | anyarray
array_smaller(anyarray,anyarray) | anyarray
array_position(anyarray,anyelement) | integer
array_position(anyarray,anyelement,integer) | integer
array_positions(anyarray,anyelement) | integer[]
array_remove(anyarray,anyelement) | anyarray
array_replace(anyarray,anyelement,anyelement) | anyarray
arrayoverlap(anyarray,anyarray) | boolean
arraycontains(anyarray,anyarray) | boolean
arraycontained(anyarray,anyarray) | boolean
elem_contained_by_range(anyelement,anyrange) | boolean
range_contains_elem(anyrange,anyelement) | boolean
range_eq(anyrange,anyrange) | boolean
range_ne(anyrange,anyrange) | boolean
range_overlaps(anyrange,anyrange) | boolean
range_contains(anyrange,anyrange) | boolean
range_contained_by(anyrange,anyrange) | boolean
range_adjacent(anyrange,anyrange) | boolean
range_before(anyrange,anyrange) | boolean
range_after(anyrange,anyrange) | boolean
range_overleft(anyrange,anyrange) | boolean
range_overright(anyrange,anyrange) | boolean
range_union(anyrange,anyrange) | anyrange
range_merge(anyrange,anyrange) | anyrange
range_intersect(anyrange,anyrange) | anyrange
range_minus(anyrange,anyrange) | anyrange
range_cmp(anyrange,anyrange) | integer
range_lt(anyrange,anyrange) | boolean
range_le(anyrange,anyrange) | boolean
range_ge(anyrange,anyrange) | boolean
range_gt(anyrange,anyrange) | boolean
range_gist_same(anyrange,anyrange,internal) | internal
(45 rows)

Now, there's no point in changing range_eq and the later entries in this
table (the ones taking two anyrange's); our rather lame definition of
anycompatiblerange means that we'd get no benefit from doing so. But
I think there's a strong case for changing everything before range_eq.
It'd be nice if something like

SELECT array[1] < array[1.1];

would work the same way that "SELECT 1 < 1.1" would.

I checked the other concern that I had about whether the more flexible
polymorphic definitions would create any new ambiguities, and I don't
see any problems with this list. As functions, none of these names are
overloaded, except with different numbers of arguments so there's no
ambiguity. Most of the array functions are also operators, but the
competing operators do not take arrays, so it doesn't look like there's
any issue on that side either.

So I think we should be more ambitious and generalize all of these
to use anycompatiblearray etc.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2020-08-30 23:20:01 Re: New default role- 'pg_read_all_data'
Previous Message Tom Lane 2020-08-30 18:50:07 Re: Deprecating postfix and factorial operators in PostgreSQL 13