Re: Row pattern recognition

From: Henson Choi <assam258(at)gmail(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: david(dot)g(dot)johnston(at)gmail(dot)com, vik(at)postgresfriends(dot)org, jacob(dot)champion(at)enterprisedb(dot)com, er(at)xs4all(dot)nl, peter(at)eisentraut(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2026-01-12 06:44:18
Message-ID: CAAAe_zChxC051WsfPrfTaJ0N362QPkj3vLnQ9pbAC-eYeco9NQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Subject: Re: Row pattern recognition

Hi Ishii-san,

I've been working on the pattern grammar and AST for row pattern
recognition.
The attached patch extends the PATTERN clause to support more regex-like
syntax.

The main additions are:

- Alternation: (A | B)
- Grouping with quantifiers: (A B)+, (A | B)*
- Full quantifier support: +, *, ?, {n}, {n,}, {n,m}
- Reluctant quantifiers: +?, *?, ?? (parsed but not executed yet)

I've introduced RPRPatternNode to represent the pattern as a Thompson-style
AST with four node types: VAR, SEQ, ALT, and GROUP. Each node stores min/max
bounds for quantifiers and a reluctant flag. This should make the eventual
NFA implementation more straightforward.

The parser also does some basic pattern optimizations. For example:
- (A (B C)) gets flattened to (A B C)
- ((A)) unwraps to just A
- (A{2}){3} becomes A{6}
- (A | B | A) simplifies to (A | B)
- A A A merges into A{3,3}

I also updated ruleutils.c to deparse patterns properly.

Current state:

The executor still uses regex matching, which works but has some
limitations.
Complex patterns with lots of alternations can be slow.

The 26-variable limit is still there since we're mapping to [a-z] for the
regex.

Reluctant quantifiers (+?, *?, ??) are parsed but will raise an error if you
try to use them. They'll need proper NFA support to work correctly.

Some optimization code was removed in the process, but it should work better
once NFA matching is in place.

Next steps:

The plan is to replace regex matching with NFA-based execution. The AST
structure is already set up for that.

The first step is converting the Thompson AST into a flat representation -
flattening pattern elements and building actual NFA states with transitions,
then replacing the regex engine with direct NFA matching. This should
handle all
the pattern syntax properly, including reluctant quantifiers.

Once that's working, the next step is handling multiple contexts during
matching -
tracking which states are active and absorbing context as we go through
rows.
This is essential for proper pattern matching behavior.

After that, I'll implement PATH and CLASSIFIER functions, which depend on
having
the match context available.

Then there's room for various optimizations - PATH space optimization,
incremental aggregate computation over matched rows, better state merging,
etc.

Longer-term, the goal is to get to MATCH_RECOGNIZE in the FROM clause for
full SQL standard compliance.

Let me know if you have any concerns or suggestions about the approach.

Best regards,
Henson

Attachment Content-Type Size
0001-grammar-and-AST-structure.txt text/plain 106.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message zengman 2026-01-12 06:53:24 Re: Use correct macro for accessing offset numbers.
Previous Message Konstantin Knizhnik 2026-01-12 06:35:17 Re: global temporary table (GTT) - are there some ideas how to implement it?