Re: Add RANGE with values and exclusions clauses to the Window Functions

From: Oliver Ford <ojford(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Erik Rijkers <er(at)xs4all(dot)nl>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add RANGE with values and exclusions clauses to the Window Functions
Date: 2018-02-02 16:26:43
Message-ID: CAGMVOduju1k4kbmsjCbYHDv=ghTf3X05sGzMtDTPzsjXUmr05Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 1, 2018 at 1:46 AM, David G. Johnston
<david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
> On Wed, Jan 31, 2018 at 5:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>
>> We could imagine reimplementing WinGetFuncArgInFrame to fix this, but
>> aside from the sheer inefficiency of simple fixes, I'm not very clear
>> what seeking relative to WINDOW_SEEK_CURRENT should mean when the current
>> row is excluded. (Of course, the current row could have been out of frame
>> before too. Maybe we should just get rid of WINDOW_SEEK_CURRENT?)
>>
>
> The exclusion clause is frame-specific and none of the three frame callers
> use WINDOW_SEEK_CURRENT (only the single partition caller does). So unless
> there is an external code concern removing WINDOW_SEEK_CURRENT from being
> valid for WinGetFuncArgInFrame seems straight forward and probably could be
> done to remove dead code whether frame exclusion is implemented or not. And
> it should remain dead since, as you say, in a frame context the current row
> may not be represented even today.

Attached patch makes WINDOW_SEEK_CURRENT invalid for
WinGetFuncArgInFrame, and adds
Exclude-specific code so that Tom's query now gives the desired results:

select x, first_value(x) over (order by x rows between
current row and 1 following exclude current row)
from generate_series(1,10) x;
x | first_value
----+-------------
1 | 2
2 | 3
3 | 4
4 | 5
5 | 6
6 | 7
7 | 8
8 | 9
9 | 10
10 |
(10 rows)

The only null row now is the last one, as only itself is in frame due
to the start being
the current row. Attached patch adds extra tests for both
"first_value" and "last_value" to
test both seek modes, and tests all three Exclude options. A similar
query for "last_value"
gives:

select x, last_value(x) over (order by x rows between
1 preceding and current row exclude current row)
from generate_series(1,10) x;
x | last_value
----+------------
1 |
2 | 1
3 | 2
4 | 3
5 | 4
6 | 5
7 | 6
8 | 7
9 | 8
10 | 9
(10 rows)

So here, the only null is the first row because it doesn't have a
preceding row and
is itself excluded.

> The three callers of WinGetFuncArgInFrame don't use the isout argument; they
> probably need to read that and a new isexcluded argument. Start at the
> head, loop until isout = true || isexcluded = false.

The patch takes a slightly different approach and puts the logic in
WinGetFuncArgInFrame.
The "row_is_in_frame" function now returns a specific return code for
when an Exclude
clause was matched. When that's returned to WinGetFuncArgInFrame, it
tries the next
row until it gets one that doesn't match an Exclude clause (either
because it's in or out of
frame).

> You could create a partition/frame that retains its contiguous property but
> you then need to map multiple original row positions onto the single frame
> rows that denote the head and tail positions for each. This seems
> considerably more bug-prone; but I don't really have a feel for how sheer-ly
> inefficient the iteration would be (assuming it is even plausible).

In the patch there's only iteration if we match the Exclude clause, and just a
couple of extra if-statements which only do integer comparisons, so I
don't think
there's much performance impact.

Attachment Content-Type Size
0001-window-frame-v12.patch application/octet-stream 173.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2018-02-02 16:28:08 Re: Built-in connection pooling
Previous Message Peter Geoghegan 2018-02-02 16:16:50 Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)