Re: Add RANGE with values and exclusions clauses to the Window Functions

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Oliver Ford <ojford(at)gmail(dot)com>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add RANGE with values and exclusions clauses to the Window Functions
Date: 2018-01-09 21:59:19
Message-ID: 13279.1515535159@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Oliver Ford <ojford(at)gmail(dot)com> writes:
> [ 0001-window-frame-v6.patch ]

Generally speaking, Postgres tries hard to be an extensible-datatype
system, going beyond the SQL standard's minimum requirements when
necessary to make it so. The reason that we don't already have RANGE
PRECEDING/FOLLOWING support is that nobody was satisfied with only
making it work for integers and datetimes. There was, as I recall, code
implementing more or less what you've got here in the original window
function submission, and we pulled it out before committing because of
that inadequacy. I don't think the fact that some years have gone by
means that we should forget about keeping the feature extensible.

One subsequent discussion about how we might make it work to project
standards was here:

https://www.postgresql.org/message-id/flat/51C3B952.60907%402ndquadrant.com

Looking back at that, I notice that we all focused on the way to identify
a suitable "+" or "-" operator, but now I'm thinking that that's not
actually a good factorization, because it'd be subject to undesirable
overflow hazards. That is, if we have an integer sequence like

2147483640
2147483641
2147483642
2147483643
2147483644

and we operate on this with "RANGE FOLLOWING 10", that approach results
in an integer overflow when we try to calculate the limit values. But
there's no real need for an overflow error. Ideally, if we try to form
2147483640 + 10 and notice it's overflowed, we'd treat the bound as
+infinity, because every non-overflowed integer value must be within range.

So the approach I'm imagining now is a datatype-specific support function
along the lines of

in_range(a, b, delta) returns bool

which is supposed to return true if a <= b + delta, or something along
that line --- exact details of the definition TBD --- with the proviso
that if b + delta would overflow then the result is automatically true.

We could probably also delegate the requirement of throwing an error
for negative delta to this function, eliminating the need for the
datatype-independent core code to know how to tell that, which is the
other datatype-dependent behavior needed per spec.

Likely there are two of these, one each for the PRECEDING and FOLLOWING
cases.

As suggested in the above-mentioned thread, we could attach such functions
as support functions in the btree opclass that defines the sort order of
the window frame's ordering column, and the core code could look it up
from there. Extensibility would come from the fact that people can define
new opclasses. Also, I believe we could support multiple such functions
per opclass, allowing the potential to support "delta"s of different
datatypes --- pg_amproc.amproclefttype would correspond to the common
type of a and b, while pg_amproc.amprocrighttype would correspond to
the data type of delta. We certainly need to allow delta to be a
different type from a/b just to handle the spec's timestamp cases.
I'm not sure if there's near-term value in multiple types of delta
values, but it seems easy to allow in this framework.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-01-09 22:01:20 Re: [HACKERS] pgbench more operators & functions
Previous Message Andres Freund 2018-01-09 21:46:55 Re: BUG #14941: Vacuum crashes