Re: Row pattern recognition

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: assam258(at)gmail(dot)com
Cc: david(dot)g(dot)johnston(at)gmail(dot)com, vik(at)postgresfriends(dot)org, jacob(dot)champion(at)enterprisedb(dot)com, er(at)xs4all(dot)nl, peter(at)eisentraut(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2026-01-12 10:45:07
Message-ID: 20260112.194507.364121646583109399.ishii@postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Henson,

> Hi Ishii-san,
>
> I've been working on the pattern grammar and AST for row pattern
> recognition.
> The attached patch extends the PATTERN clause to support more regex-like
> syntax.
>
> The main additions are:
>
> - Alternation: (A | B)
> - Grouping with quantifiers: (A B)+, (A | B)*
> - Full quantifier support: +, *, ?, {n}, {n,}, {n,m}
> - Reluctant quantifiers: +?, *?, ?? (parsed but not executed yet)

Excellent!

> I've introduced RPRPatternNode to represent the pattern as a Thompson-style
> AST with four node types: VAR, SEQ, ALT, and GROUP. Each node stores min/max
> bounds for quantifiers and a reluctant flag. This should make the eventual
> NFA implementation more straightforward.
>
> The parser also does some basic pattern optimizations. For example:
> - (A (B C)) gets flattened to (A B C)
> - ((A)) unwraps to just A
> - (A{2}){3} becomes A{6}
> - (A | B | A) simplifies to (A | B)
> - A A A merges into A{3,3}
>
> I also updated ruleutils.c to deparse patterns properly.
>
>
> Current state:
>
> The executor still uses regex matching, which works but has some
> limitations.
> Complex patterns with lots of alternations can be slow.
>
> The 26-variable limit is still there since we're mapping to [a-z] for the
> regex.
>
> Reluctant quantifiers (+?, *?, ??) are parsed but will raise an error if you
> try to use them. They'll need proper NFA support to work correctly.
>
> Some optimization code was removed in the process, but it should work better
> once NFA matching is in place.

Yeah. Now the rpr regression takes more than 7 seconds. (with current
v37 patch it takes 86ms). I expect NFA matching works better.

> Next steps:
>
> The plan is to replace regex matching with NFA-based execution. The AST
> structure is already set up for that.

Sounds promising.

> The first step is converting the Thompson AST into a flat representation -
> flattening pattern elements and building actual NFA states with transitions,
> then replacing the regex engine with direct NFA matching. This should
> handle all
> the pattern syntax properly, including reluctant quantifiers.
>
> Once that's working, the next step is handling multiple contexts during
> matching -
> tracking which states are active and absorbing context as we go through
> rows.
> This is essential for proper pattern matching behavior.

Ok.

> After that, I'll implement PATH and CLASSIFIER functions, which depend on
> having
> the match context available.

What is PATH function? It's not in R010 or R020 as far as I know.

> Then there's room for various optimizations - PATH space optimization,
> incremental aggregate computation over matched rows, better state merging,
> etc.

Sounds good.

> Longer-term, the goal is to get to MATCH_RECOGNIZE in the FROM clause for
> full SQL standard compliance.

What about MEASURES clause? Without it, many things in the standard
cannot be implemented.

> Let me know if you have any concerns or suggestions about the approach.

Here are some comments on your patches.

- It is based on my v36 patches. But the latest one is v37 patch. It
would be nice if you create patches based on the my latest patches.

- You are doing some optimization (like (A (B C)) gets flattened to (A
B C) etc.) in the parse analysis phase. I think doing that kind of
optimization should be done in the optimizer/planner. Because with
your patch a deparsed query (as shown by pg_get_viewdef()) looks
different from what user inputs.

- If you add files to src/backend/parser, it should be noted in
src/backend/parser/README.

- It would be noce to update typedefs patch if you add new typedefs.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alena Vinter 2026-01-12 10:50:51 Re: Resetting recovery target parameters in pg_createsubscriber
Previous Message Roman Khapov 2026-01-12 10:33:05 Re: amcheck: support for GiST