Re: Row pattern recognition

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: assam258(at)gmail(dot)com
Cc: sjjang112233(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2026-03-07 03:01:51
Message-ID: 20260307.120151.1477244845022229828.ishii@postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Henson,

> Hi, Tatsuo
>
> Does "a zero-length match" mean "an empty match"?
>>
>
> Yes, they refer to the same thing. "Zero-length match" is the more
> common term in general regex implementations (PCRE2, Perl, Python,
> Java, etc.[1]), but the RPR standard (ISO/IEC 19075-5, Section 4.12.2)
> uses "empty match" exclusively.
>
> [1] https://www.regular-expressions.info/zerolength.html

I found Trino uses "empty match" too [2]. So for SQL users, I guess
"empty match" is more familiar wording.

> Yes, we should follow master's convention. I see three options:
>
> (a) Reorder within nodeWindowAgg.c: move the nfa_* functions up and
> keep the "API exposed to window functions" section at the bottom,
> matching master's layout.
>
> (b) Separate file under src/backend/executor/, keeping it close to
> nodeWindowAgg.c while making the boundary explicit.
>
> (c) A dedicated src/backend/rpr/ directory modeled on
> src/backend/regex/, giving the NFA engine its own namespace.
> This could also be an opportunity to consolidate the existing
> src/backend/optimizer/plan/rpr.c into the same directory.
>
> For now (a) is the safest change. Longer term, (b) or (c) would make
> more sense -- especially when we extend to MATCH_RECOGNIZE (R010),
> where the NFA engine will need to be shared across both code paths.
> Either way, the NFA engine can be exposed via a header so that R010
> can share it without further restructuring.
>
> Since the NFA algorithm is not familiar territory for most DBMS
> developers, it would also be worth preserving the detailed algorithm
> description posted earlier in this thread -- either as structured
> comments or as a dedicated README alongside the code.
>
> What do you think? Should we start with (a) now and revisit the
> broader restructuring approaches -- (b) or (c) -- later, or would you
> prefer to discuss them first? Either of those would also resolve the
> file layout convention issue naturally, since new files would follow
> proper conventions from the start.

I prefer (a) or (b) for now, at least for the first commit. The reason
is, current nfa functions take a WindowAggState argument. If we prefer
(c), I think we need to change some of (or most of) nfa functions so
that they do not take the WindowAggState argument. What do you think?

> One more thing: there are no ECPG example programs or regression tests
> for RPR yet. I'd like to propose adding them. Shall I draft an
> initial set, or would you prefer to coordinate with the ECPG
> maintainers first?

I am not familiar with ECPG. Do you know if ECPG has Window clause
tests? If ECPG does not have any Window clause tests, is it worth to
add RPR tests to ECPG?

[2] https://trino.io/docs/current/sql/match-recognize.html#evaluating-expressions-in-empty-matches-and-unmatched-rows

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2026-03-07 06:17:39 Re: Add expressions to pg_restore_extended_stats()
Previous Message Amit Kapila 2026-03-07 01:11:46 Re: Skipping schema changes in publication