Re: Questionable result from lead(0) IGNORE NULLS

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: ojford(at)gmail(dot)com
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Questionable result from lead(0) IGNORE NULLS
Date: 2025-10-08 08:00:07
Message-ID: 20251008.170007.1614484426305151178.ishii@postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Oliver,

After studying the standard more, it seems I was totally wrong.
In summary, current code is correct. Sorry for noise.

From the standard explaining lead():

> B) If OFFSET = 0 (zero), then the value of <window function> is the
> value of VE1 evaluated for the current row.
(here VE1 referes to the first argument of lead()).

So it seems we need to treat offset==0 case specially. i.e. eval VE1
and do not through away even if the result is NULL.

> I noticed a questionable result from "lead(0) IGNORE NULLS".
>
> CREATE TEMP TABLE g(x INT, y INT);
> CREATE TABLE
> INSERT INTO g (VALUES(NULL,1),(NULL,2),(1,3));
> INSERT 0 3
> SELECT * FROM g;
> x | y
> ---+---
> | 1
> | 2
> 1 | 3
> (3 rows)
>
> SELECT x, y, lead(x, 0) RESPECT NULLS OVER w FROM g
> WINDOW w AS (ORDER BY y);
> x | y | lead
> ---+---+------
> | 1 |
> | 2 |
> 1 | 3 | 1
> (3 rows)
>
> SELECT x, y, lead(x, 0) IGNORE NULLS OVER w FROM g
> WINDOW w AS (ORDER BY y);
> x | y | lead
> ---+---+------
> | 1 |
> | 2 |
> 1 | 3 | 1
> (3 rows)
>
> As you can see, "lead(x, 0) IGNORE NULLS" shows the same result as
> "lead(x, 0) RESPECT NULLS". IMO "lead(x, 0) IGNORE NULLS" should show
> something like:

No. The result above is perfectly correct.

> if (winobj->ignore_nulls == IGNORE_NULLS && relpos != 0)

Now I understand why "relpos != 0" part is necessary.

Also, my email below was wrong.

> While working on this, I found some of window function regression
> tests using lead/lag are not quite correct. Below is some of them.
>
> -- lead
> SELECT name,
> orbit,
> lead(orbit) OVER w AS lead,
> lead(orbit) RESPECT NULLS OVER w AS lead_respect,
> lead(orbit) IGNORE NULLS OVER w AS lead_ignore
> FROM planets
> WINDOW w AS (ORDER BY name)
> ;
> name | orbit | lead | lead_respect | lead_ignore
> ---------+-------+-------+--------------+-------------
> earth | | 4332 | 4332 | 4332
> jupiter | 4332 | | | 88
> mars | | 88 | 88 | 88
> mercury | 88 | 60182 | 60182 | 60182
> neptune | 60182 | 90560 | 90560 | 90560
> pluto | 90560 | 24491 | 24491 | 24491
> saturn | 24491 | | | 224
> uranus | | 224 | 224 | 224
> venus | 224 | | |
> xyzzy | | | |
> (10 rows)
>
> Why lead_ignore shows "4332" on the first row? Since "orbit"'s second
> non null row is orbit==88, I think lead(orbit) should return 88,
> rather than 4332 if my understanding of the SQL standard is correct.

According to the standard if OFFSET is not 0, then:
> B) Otherwise, let TX be the sequence of values that is the result of applying VE1 to each
> row of T that follows the current row and eliminating null values, ordered according
> to the window ordering of WDX.

> C) Otherwise, the value of <window function> is the m-th value of
> TX, where m = OFFSET.

Thus TX does not include current row and for the first row, and
lead(orbit) returns the second row, that is orbit==4332 because m==1.
So, current regression test expected file is correct.

Again, sorry for noise.
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message v@viktorh.net 2025-10-08 08:51:39 Re: Allow ON CONFLICT DO UPDATE to return EXCLUDED values
Previous Message Jeremy Schneider 2025-10-08 07:06:12 Re: sync_standbys_defined and pg_stat_replication