Re: Add RANGE with values and exclusions clauses to the Window Functions

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Oliver Ford <ojford(at)gmail(dot)com>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add RANGE with values and exclusions clauses to the Window Functions
Date: 2018-02-01 00:06:00
Message-ID: 15869.1517443560@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Oliver Ford <ojford(at)gmail(dot)com> writes:
> [ 0001-window-frame-v11.patch ]

I've realized that the exclusion clause aspect of this patch is rather
badly broken. In particular, the "seek to row" logic in
WinGetFuncArgInFrame is critically dependent on the assumption that the
rows of the frame are contiguous. Use of an EXCLUDE option makes them
not contiguous, but that doesn't mean you can just return NULL if the
seek hits one of the excluded rows. The way the spec is written, it's
pretty clear that e.g. first_value() should be the value from the first
row that survives all exclusions. But as this is coded, if the first
row that'd otherwise be in frame is excluded by EXCLUDE, you'll get
NULL, not the value from the first row that isn't excluded. An example
of getting the wrong results:

regression=# select x, first_value(x) over (order by x rows between
current row and 1 following exclude current row)
from generate_series(1,10) x;
x | first_value
----+-------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
(10 rows)

We could imagine reimplementing WinGetFuncArgInFrame to fix this, but
aside from the sheer inefficiency of simple fixes, I'm not very clear
what seeking relative to WINDOW_SEEK_CURRENT should mean when the current
row is excluded. (Of course, the current row could have been out of frame
before too. Maybe we should just get rid of WINDOW_SEEK_CURRENT?)

I'm a bit tempted to rip out the exclusion-clause support and leave the
topic to be revisited later. It'd have been better done as a separate
patch anyhow IMO, since it seems quite orthogonal to the RANGE or GROUPS
options. (And TBH, given the lack of field demand for it, I'm not sure
that we want to pay a complexity and performance price for it.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2018-02-01 00:10:19 Re: [HACKERS] Surjective functional indexes
Previous Message David Rowley 2018-01-31 23:57:41 Re: [HACKERS] path toward faster partition pruning