Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls

From: Nicholas White <n(dot)j(dot)white(at)gmail(dot)com>
To: Troels Nielsen <bn(dot)troels(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
Date: 2013-06-24 17:01:31
Message-ID: CA+=vxNZyfOW29HJ=KO8V7DRDVeJvjDm5NX-qswYe=k_b9BcDNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Good catch - I've attached a patch to address your point 1. It now returns
the below (i.e. correctly doesn't fill in the saved value if the index is
out of the window. However, I'm not sure whether (e.g.) lead-2-ignore-nulls
means count forwards two rows, and if that's null use the last one you've
seen (the current implementation) or count forwards two non-null rows (as
you suggest). The behaviour isn't specified in a (free) draft of the 2003
standard (http://www.wiscorp.com/sql_2003_standard.zip), and I don't have
access to the (non-free) final version. Could someone who does have access
to it clarify this? I've also added your example to the regression test
cases.

select val, lead(val, 2) ignore nulls over (order by id) from test_table;
val | lead
-----+------
1 | 3
2 | 4
3 | 4
4 | 4
| 4
| 5
| 6
5 | 7
6 |
7 |
(10 rows)

If the other reviewers are happy with your grammar changes then I'll merge
them into the patch. Alternatively, if departing from the standard is OK
then we could reorder the keywords so that a window function is like SELECT
lag(x,1) OVER RESPECT NULLS (ORDER BY y) - i.e. putting the respect /
ignore tokens after the OVER reserved keyword. Although non-standard it'd
make the grammar change trivial.

> Also, I think someone mentioned this already, but what about
> first_value() and last_value()? Shouldn't we do those at the same time?

I didn't include this functionality for the first / last value window
functions as their implementation is currently a bit different; they just
call WinGetFuncArgInFrame to pick out a single value. Making these
functions respect nulls would involve changing the single lookup to a walk
through the tuples to find the first non-null version, and keeping track of
this index in a struct in the context. As this change is reasonably
orthogonal I was going to submit it as a separate patch.

Thanks -

Attachment Content-Type Size
lead-lag-ignore-nulls.patch application/octet-stream 23.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2013-06-24 17:02:11 Re: [9.4 CF 1] The Commitfest Slacker List
Previous Message Josh Berkus 2013-06-24 16:57:45 Re: [9.4 CF 1] The Commitfest Slacker List