| From: | Henson Choi <assam258(at)gmail(dot)com> |
|---|---|
| To: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
| Cc: | zsolt(dot)parragi(at)percona(dot)com, sjjang112233(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Row pattern recognition |
| Date: | 2026-03-29 14:06:11 |
| Message-ID: | CAAAe_zCUrKGBgZdaazdO_v9QWHsS_1DXuP=rLeNhO3iwsHwwbg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Tatsuo,
Separately from the PREV/NEXT patches, I have been analyzing
how FIRST and LAST navigation would interact with the current
NFA engine -- particularly context absorption and DEFINE
evaluation sharing. This is not a proposal to implement them
now; rather, I wanted to document the design constraints
while the analysis is fresh, so we have a reference when we
are ready to add them.
Design Note: FIRST/LAST Navigation for RPR
===========================================
Scope: This note covers only patterns without quantifiers, where
each pattern variable consumes exactly one row.
1. Navigation Function Classification
--------------------------------------
The SQL standard (ISO/IEC 19075-5:2021) defines four navigation
functions in two categories:
PREV/NEXT -- physical navigation:
- Scope: entire row pattern partition
- Offset: physical (row position within partition)
- Base row: the row denoted by the variable under running
semantics (in our no-qualifier scope: current eval row)
- Default offset: 1
- Can reach rows outside the match (standard 5.6.2)
- Always RUNNING semantics only
FIRST/LAST -- logical navigation:
- Standard scope: rows mapped to a specific variable
- Our scope (no quantifier): match_start to currentpos range
- Offset: logical (in standard), but equivalent to physical
when each variable maps to at most one row
- Base: match_start (FIRST) or currentpos (LAST)
- Default offset: 0 (unlike PREV/NEXT which default to 1)
- RUNNING semantics in DEFINE; FINAL available in MEASURES
FIRST/LAST results depend on the match start position
(match_start). Each current row in window RPR triggers an
independent match attempt with its own match_start. FIRST
always depends on match_start. LAST(price) depends only on
currentpos and is match_start-independent, but LAST(price, N)
depends on match_start because the NULL boundary check
(currentpos - N < match_start) varies with match_start.
Example (PATTERN (A B C), match_start = 3):
FIRST(price) = R3.price (match_start row)
FIRST(price, 1) = R4.price (match_start + 1)
LAST(price) = R5.price (currentpos row)
LAST(price, 1) = R4.price (currentpos - 1)
LAST(price, 3) = NULL (before match_start)
PREV(price, 2) = R3.price (pos - 2, physical)
NEXT(price, 2) = R7.price (pos + 2, beyond match)
PREV/NEXT can reach any row in the partition regardless of
match boundaries. FIRST/LAST are valid only within the
match_start to currentpos range.
2. match_start and NFA Contexts
-------------------------------
Each match_start defines one match attempt, which corresponds
to one NFA context. Different contexts have different
match_start values. Context absorption merges contexts when
one context's pattern states are covered by another's. If the
merged contexts have different match_start values, FIRST/LAST
would produce incorrect results because the surviving context's
match_start would be used for both.
3. Compound Navigation (PREV/NEXT wrapping FIRST/LAST)
------------------------------------------------------
The standard (5.6.4) permits nesting FIRST/LAST inside
PREV/NEXT. Using the standard's notation with variable
qualifiers (which our no-quantifier scope does not yet
require):
PREV(LAST(A.Price + A.Tax, 1), 3)
Evaluation proceeds in three steps:
1. Inner FIRST/LAST: determines target row position only
(does not evaluate the expression)
2. Outer PREV/NEXT: applies physical offset from that row
3. Expression evaluation: at the final destination row only
The reverse nesting (FIRST/LAST wrapping PREV/NEXT) and
same-kind nesting (PREV inside PREV, etc.) are prohibited.
Since the inner function only determines a position, the
compound operation can be flattened into a single RPRNavExpr
node internally, requiring only one slot swap.
4. Context Absorption Rules
----------------------------
Absorption safety depends on whether the navigation function
depends on match_start:
DEFINE content | match_start dep. | absorption
--------------------------------|------------------|----------
No navigation | none | safe
PREV/NEXT only | none | safe
LAST (no offset) | none | safe
LAST (with offset) | boundary check | unsafe
FIRST (any) | direct | unsafe
Compound (inner FIRST) | direct | unsafe
Compound (inner LAST, no off.) | none | safe
Compound (inner LAST, with off.)| boundary check | unsafe
The criterion: if the expression depends on match_start in
any way, absorption is unsafe. Comparing match_start values
at runtime is impractical, so we treat any dependency as
unconditionally unsafe.
5. DEFINE Clause Evaluation Rules
---------------------------------
Currently, nfa_evaluate_row(pos) evaluates all DEFINE
variables once per row and shares the results across all NFA
contexts.
DEFINE content | evaluation
--------------------------------|--------------------------
No navigation | shared (once per row)
PREV/NEXT only | shared (once per row)
LAST (no offset) | shared (once per row)
LAST (with offset) | per-context
FIRST (any) | per-context
Compound (inner FIRST) | per-context
Compound (inner LAST, no off.) | shared (once per row)
Compound (inner LAST, with off.)| per-context
6. Slot Swap Elision
--------------------
When the final target position resolves to currentpos, the
slot swap can be elided. This applies to:
- PREV(price, 0), NEXT(price, 0): offset 0
- LAST(price), LAST(price, 0): always currentpos
- Compound cases where offsets cancel, e.g.,
NEXT(LAST(price, 1), 1)
The PREV/NEXT infrastructure (RPRNavExpr, EEOP_RPR_NAV_SET/
RESTORE, nav_winobj) can be reused directly for FIRST/LAST --
it would mainly require adding two RPRNavKind values and
match_start arithmetic in the ExecEvalRPRNavSet switch.
No variable qualifiers, match history tracking, or CLASSIFIER
infrastructure would be needed for this no-quantifier scope.
I do think CLASSIFIER, MEASURES, and match history tracking
should stay out of the initial patch set -- they add
significant complexity to NFA contexts and affect absorption
in ways that are hard to validate incrementally. Variable
qualifiers and quantifier support for navigation functions
are also tied to match history infrastructure, so they should
be deferred as well. Better to get the current feature set
stable first.
With PREV/NEXT now resolved (0006), the next major milestone
would be match history infrastructure (needed for CLASSIFIER,
MEASURES, variable qualifiers, and quantifier-aware
navigation). Before that larger step, there are a few
remaining items that can be done within the current
architecture:
Optimization:
- PREFIX pattern absorption (e.g., A (B+) C) [1][2]
- Fixed-length iteration group absorption
(e.g., (A B{2})+ ) [3]
Feature:
- FIRST/LAST navigation (this note)
- LLVM JIT proper support for slot swap (0006 uses
interpreter fallback) [4][5]
[1]
https://www.postgresql.org/message-id/CAAAe_zAEg7sVM%3DWDwXMyE-odGmQyXSVi5ZzWgye6SupSjdMKpg%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/CAAAe_zDq7R8CaDfmh8C%2BH3_639Y5LtJD%2B2Z%3D1txDt%3DvaOr90rQ%40mail.gmail.com
[3]
https://www.postgresql.org/message-id/CAAAe_zBY%2BEqk_DQpS0cy1Eb67H9v92tyaf%2BU%2BSKKuLGUs_z%2BEA%40mail.gmail.com
[4]
https://www.postgresql.org/message-id/CAAAe_zBbrnx2fjK2s%2BJgx6TSOdnKAPawXbHeX49WqmX9ji%2BHdg%40mail.gmail.com
[5]
https://www.postgresql.org/message-id/CAAAe_zBCF3dwSjStmG0kJqw_y1z8QD73Rf1G58QTKEvd9tScwA%40mail.gmail.com
The question is how much of this to complete before the
current patches are committed. The patch set is already
substantial, and the next CommitFest is in July. What do
you think -- should we try to include some of these items
in the current round, or focus on getting what we have
committed first? Also, do you see any blockers that might
prevent the current patch set from being committed?
Best regards,
Henson
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Pavel Stehule | 2026-03-29 16:42:46 | PoC - psql - emphases line with table name in verbose output |
| Previous Message | Andrew Jackson | 2026-03-29 13:55:54 | Re: Add ldapservice connection parameter |