================================================================================ PostgreSQL Row Pattern Recognition: Flat-Array Stream NFA Guide ================================================================================ Target audience: Developers with a basic understanding of the PostgreSQL executor and planner architecture Scope: The entire process from PATTERN/DEFINE clause parsing to NFA runtime execution Related code: - src/backend/parser/parse_rpr.c (parser phase) - src/backend/optimizer/plan/rpr.c (optimizer phase) - src/backend/executor/nodeWindowAgg.c (executor phase) - src/include/nodes/plannodes.h (plan node definitions) - src/include/nodes/execnodes.h (execution state definitions) - src/include/optimizer/rpr.h (types and constants) ================================================================================ What is a Flat-Array Stream NFA? The NFA in this implementation is not a traditional state-transition graph but a flat array of fixed-size 16-byte elements. At runtime, it processes the row stream in a forward-only manner, expanding epsilon transitions eagerly without backtracking. - Flat-Array: Pattern compiled into a flat array, not a graph (Chapter IV) - Stream: Rows consumed sequentially in one direction, never revisited (Chapter XII) - NFA: Nondeterministic execution where multiple states coexist within a single context (Chapter VI) Chapter I Row Pattern Recognition Overview ================================================================================ Row Pattern Recognition (hereafter RPR) is a feature introduced in SQL:2016 that matches regex-based patterns against ordered row sets. The SQL standard defines two forms: Feature R010: MATCH_RECOGNIZE (FROM clause) - Dedicated table operator - Provides dedicated functions such as MATCH_NUMBER(), CLASSIFIER() - Supports ONE ROW PER MATCH / ALL ROWS PER MATCH Feature R020: RPR in a window (WINDOW clause) - Integrated into the existing window function framework - Supports ALL ROWS PER MATCH only - No MATCH_NUMBER() This implementation targets Feature R020. The basic syntax is as follows: SELECT ... OVER ( PARTITION BY ... ORDER BY ... ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING [INITIAL | SEEK] -- SEEK is defined in the standard but not implemented AFTER MATCH SKIP TO NEXT ROW | SKIP PAST LAST ROW PATTERN ( ) DEFINE AS , ... ) The PATTERN clause is a regular expression over row pattern variables. The DEFINE clause specifies boolean conditions that determine whether each variable evaluates to true for the current row. Example: PATTERN (A+ B) DEFINE A AS price > PREV(price), B AS price < PREV(price) This pattern matches "a span where prices rise consecutively then drop." Chapter II Overall Processing Pipeline ================================================================================ RPR processing is divided into three phases: +------------------------------------------------------------+ | 1. Parsing (Parser) | | SQL text -> PATTERN AST + DEFINE expression tree | | | | 2. Compilation (Optimizer/Planner) | | PATTERN AST -> optimization -> flat NFA element array | | | | 3. Execution (Executor) | | Row-by-row matching via NFA simulation | +------------------------------------------------------------+ Each phase uses independent data structures, and the interfaces between phases are well-defined: Parser -> Planner: WindowClause.rpPattern (RPRPatternNode tree) WindowClause.defineClause (TargetEntry list) Planner -> Executor: WindowAgg.rpPattern (RPRPattern struct) WindowAgg.defineClause (TargetEntry list) Chapter III Parsing Phase ================================================================================ III-1. Entry Point transformWindowDefinitions() (parse_clause.c) +-- transformRPR() (parse_rpr.c) transformRPR() is invoked when RPCommonSyntax is present and performs the following: (1) Frame option validation - Only ROWS is allowed (RANGE, GROUPS are not) - The start boundary must be CURRENT ROW - EXCLUDE option is not allowed (2) Transcription to WindowClause - Copies rpPattern, rpSkipTo, initial fields (3) DEFINE clause transformation (transformDefineClause) III-2. PATTERN AST The parser transforms the PATTERN clause into an RPRPatternNode tree. Each node has one of the following four types: RPR_PATTERN_VAR Variable reference. Name stored in varName field. RPR_PATTERN_SEQ Concatenation. Children node list in children. RPR_PATTERN_ALT Alternation. Branch node list in children. RPR_PATTERN_GROUP Group (parentheses). Body node list in children. All nodes have min/max fields to express quantifiers: A -> VAR(A, min=1, max=1) A+ -> VAR(A, min=1, max=INF) A* -> VAR(A, min=0, max=INF) A? -> VAR(A, min=0, max=1) A{3,5} -> VAR(A, min=3, max=5) If the reluctant field is not -1, the quantifier is reluctant (non-greedy). Example: PATTERN ((A+ B) | C*) ALT +-- SEQ | +-- VAR(A, 1, INF) | +-- VAR(B, 1, 1) +-- VAR(C, 0, INF) III-3. DEFINE Clause Transformation transformDefineClause() processes each DEFINE variable as follows: (1) Checks for duplicate variable names (2) Transforms the expression into a standard SQL expression (3) Coerces to Boolean type (coerce_to_boolean) (4) Wraps in a TargetEntry with the variable name set in resname Variables that are used in PATTERN but not defined in DEFINE are implicitly evaluated as TRUE (matching all rows). Chapter IV Compilation Phase ================================================================================ IV-1. Entry Point create_windowagg_plan() (createplan.c) +-- collectPatternVariables() Collect variable names +-- filterDefineClause() Remove unused DEFINE entries +-- buildRPRPattern() NFA compilation (6 phases) IV-2. The 6 Phases of buildRPRPattern() Phase 1: AST optimization (optimizeRPRPattern) Phase 2: Statistics collection (scanRPRPattern) Phase 3: Memory allocation (allocateRPRPattern) Phase 4: NFA element fill (fillRPRPattern) Phase 5: Finalization (finalizeRPRPattern) Phase 6: Absorbability analysis (computeAbsorbability) IV-3. Phase 1: AST Optimization After copying the parser-generated AST, the following optimizations are applied: (a) SEQ flattening: Unwrap nested SEQ nodes SEQ(A, SEQ(B, C)) -> SEQ(A, B, C) (b) Consecutive variable merging: Merge consecutive occurrences of the same variable into a single quantifier A A -> A{2} A{2,3} A{1,2} -> A{3,5} (c) Consecutive group merging: Merge repeated identical groups (A B)+ (A B)+ -> (A B){2,INF} (d) Consecutive ALT merging: Merge repeated identical ALT nodes (A | B) (A | B) (A | B) -> (A | B){3} (e) Prefix/suffix absorption: Absorb identical sequences before/after a group A B (A B)+ -> (A B){2,INF} (f) ALT flattening and deduplication (A | (B | C)) -> (A | B | C) (A | B | A) -> (A | B) (g) Quantifier multiplication: Collapse nested quantifiers when safe (A+)+ -> A+ (A{2,3}){5} -> A{10,15} (h) Single-child unwrap SEQ(A) -> A, (A){1,1} -> A IV-4. Phase 4: NFA Element Array Generation Transforms the optimized AST into a flat array of RPRPatternElement. This is the core data structure used for NFA simulation at runtime. RPRPatternElement struct (16 bytes): Field Size Description --------------------------------------------------------- varId 1B Variable ID (0-251) or control code (252-255) depth 1B Group nesting depth flags 1B Bit flags (see below) reserved 1B Padding min 4B Quantifier lower bound max 4B Quantifier upper bound next 2B Next element index (sequential flow) jump 2B Branch target index (for ALT/GROUP) Control codes: RPR_VARID_BEGIN (252) Group start marker RPR_VARID_END (253) Group end marker RPR_VARID_ALT (254) Alternation start marker RPR_VARID_FIN (255) Pattern completion marker Element flags (1 byte, bitmask): Bit Constant Set on Description -------------------------------------------------------------------------- 0x01 RPR_ELEM_RELUCTANT VAR, BEGIN, Non-greedy quantifier. END Prefers shorter match: try exit-loop first, then repeat. Set on VAR for simple (A+?), on BEGIN+END for group ((...)+?) 0x02 RPR_ELEM_EMPTY_LOOP END Group body can produce empty match (all children nullable). Creates a fast-forward exit clone alongside the normal loop-back so cycle detection doesn't kill legitimate matches. (IV-4b) 0x04 RPR_ELEM_ABSORBABLE_BRANCH VAR, BEGIN, Element lies within an END, ALT absorbable region. Used at runtime to track whether the current NFA state is in an absorbable context. 0x08 RPR_ELEM_ABSORBABLE VAR, END Absorption judgment point. WHERE to compare consecutive iterations for absorption. - Simple unbounded VAR (A+): set on the VAR itself - Unbounded GROUP ((A B)+): set on the END element only Accessor macros: RPRElemIsReluctant(e) (e)->flags & 0x01 RPRElemCanEmptyLoop(e) (e)->flags & 0x02 RPRElemIsAbsorbableBranch(e) (e)->flags & 0x04 RPRElemIsAbsorbable(e) (e)->flags & 0x08 Example: PATTERN (A+ B | C) AST: ALT(SEQ(VAR(A,1,INF), VAR(B,1,1)), VAR(C,1,1)) Compilation result: idx varId depth min max next jump Description --------------------------------------------------- 0 ALT 0 1 1 1 -1 Alternation start 1 A(0) 1 1 INF 2 3 Branch 1: A+ 2 B(1) 1 1 1 4 -1 Branch 1: B -> FIN 3 C(2) 1 1 1 4 -1 Branch 2: C -> FIN 4 FIN 0 1 1 -1 -1 Pattern completion - idx 0: ALT marker. next(=1) is the start of the first branch - idx 1: Variable A. next(=2) is B, jump(=3) is the start of the second branch - idx 2: Variable B. next(=4) is FIN - idx 3: Variable C. next(=4) is FIN - idx 4: FIN marker. Match completion signal Roles of next and jump: - next: The next element to move to "after consuming" the current element. For VAR, the next position after a successful match. For BEGIN/END, the next position inside/outside the group. - jump: The element to "skip to." In ALT, a jump from one branch to the next branch. In BEGIN, a skip path to END+1 (for groups with min=0). In END, a loop-back to the start of the group body. Example: PATTERN ((A B)+) idx varId depth min max next jump Description --------------------------------------------------- 0 BEGIN 0 1 INF 1 4 Group start 1 A(0) 1 1 1 2 -1 A 2 B(1) 1 1 1 3 -1 B 3 END 0 1 INF 4 1 Group end 4 FIN 0 1 1 -1 -1 Pattern completion - idx 0: BEGIN. next(=1) enters the group body. jump(=4) skips to after END = FIN (used when min=0). - idx 3: END. next(=4) exits the group. jump(=1) loops back to the start of the group body. IV-4a. Reluctant Flag (RPR_ELEM_RELUCTANT) The reluctant flag is set during Phase 4 (fillRPRPattern) when the AST node has a non-negative reluctant field (reluctant >= 0). It reverses the priority of quantifier expansion at runtime: Greedy (default): try loop-back first, then exit (prefer longer match) Reluctant: try exit first, then loop-back (prefer shorter match) The flag is set on all elements that carry the quantifier: Simple VAR (A+?): RPR_ELEM_RELUCTANT on the VAR element Group ((...)+?): RPR_ELEM_RELUCTANT on BEGIN and END elements At runtime (nfa_advance), the flag controls DFS exploration order: VAR with quantifier: Greedy: primary path = next (continue), clone = jump (skip) Reluctant: primary path = jump (skip), clone = next (continue) END element: Greedy: primary path = jump (loop-back), clone = next (exit) Reluctant: primary path = next (exit), clone = jump (loop-back) BEGIN with min=0: Greedy: primary path = next (enter group), clone = jump (skip) Reluctant: primary path = jump (skip), clone = next (enter group) The absorption optimization requires greedy quantifiers. Reluctant quantifiers are excluded from absorbability analysis (see IV-5). IV-4b. Empty Loop Flag (RPR_ELEM_EMPTY_LOOP) The empty-loop flag is set during Phase 4 (fillRPRPatternGroup) on the END element when the group body is nullable -- i.e., every path through the body can match zero rows (all children are nullable). Example patterns that trigger this flag: (A?)* A is nullable (min=0), so group body is nullable -> END gets flag (A? B?)+ Both children nullable -> body nullable -> END gets flag (A | B*) B* is nullable, making the ALT nullable -> END gets flag The flag works in conjunction with the zero-consumption cycle detection (elemIdx visited bitmap). Without this flag, cycle detection alone would cause legitimate matches to fail. Problem example: (A*){2,3} - Iteration 1: A* consumes all available rows -> count=1, END reached - Loop-back for iteration 2: A* matches zero rows -> END reached again - Cycle detection sees the same elemIdx on the same row -> state killed - count never reaches min(2) -> match fails (incorrect) With the RPR_ELEM_EMPTY_LOOP flag, nfa_advance_end creates two paths: the normal loop-back (which cycle detection will eventually kill) and a fast-forward exit clone that bypasses the loop entirely. (See IX-4(c) for detailed runtime behavior.) - Zero-consumption is impossible since body is not nullable IV-5. Absorbability Analysis (RPR_ELEM_ABSORBABLE) Context absorption is an optimization technique that reduces O(n^2) to O(n). (Runtime behavior is described in Chapter VIII.) This phase determines whether the pattern has a structure suitable for the absorption optimization and sets flags on the relevant elements: RPR_ELEM_ABSORBABLE Absorption comparison point RPR_ELEM_ABSORBABLE_BRANCH Element within an absorbable region Eligibility conditions: (1) SKIP PAST LAST ROW (not NEXT ROW) (2) Frame end is UNBOUNDED FOLLOWING Structural conditions (isUnboundedStart + computeAbsorbabilityRecursive): Case 1: Simple VAR+ (e.g., A+) -> ABSORBABLE | ABSORBABLE_BRANCH set on the VAR Case 2: GROUP+ whose body consists only of {1,1} VARs (e.g., (A B)+) -> ABSORBABLE_BRANCH on children, ABSORBABLE | ABSORBABLE_BRANCH on END Case 3: GROUP+ whose body starts with VAR+ (e.g., (A+ B)+) -> Recurses from BEGIN into the body, applying Case 1. ABSORBABLE | ABSORBABLE_BRANCH set on A. B and END get no flags -> absorption stops once past A. Absorbability is determined per-element, not per-pattern. Absorption comparison is performed only when a state resides at an element with the RPR_ELEM_ABSORBABLE flag. Once a state leaves the flagged region, absorption is permanently disabled for that state. Through this mechanism, the runtime guarantees monotonicity: "a context that started earlier always subsumes a context that started later." Chapter V NFA Runtime Data Structures ================================================================================ V-1. RPRNFAState -- NFA State A single NFA state represents "how far the pattern has progressed." Field Description --------------------------------------------------------- elemIdx Index of the current pattern element counts[] Repetition count per group depth isAbsorbable Whether the state is in an absorbable region next Next state in the linked list The size of the counts array is (maxDepth + 1), allocated as a flexible array member at the end of the struct. Example: In PATTERN ((A B)+ C), a state waiting for B in the 3rd iteration Element array: [0:BEGIN(d0) 1:A(d1) 2:B(d1) 3:END(d0) 4:C(d0) 5:FIN] elemIdx = 2 (B, depth 1) counts[0] = 2 (depth 0: depth of END. Group completed 2 iterations) counts[1] = 1 (depth 1: depth of B. A matched in current iteration) Counts are indexed by depth, not by elemIdx. counts[0] is incremented when passing through END(depth 0), and the group repetition count is preserved even when the state is at B(depth 1). Definition of two states being "equal": Two states are equal if they have the same elemIdx and the same counts up to the depth of that element. nfa_states_equal() compares counts[0..elem->depth] using memcmp. Only counts at or below the depth of the current element are meaningful. V-2. RPRNFAContext -- Matching Context A single context represents "a matching attempt started from a specific start row." Field Description --------------------------------------------------------- states Linked list of active NFA states matchStartRow Row number where matching started matchEndRow Row number where matching completed (-1 if incomplete) lastProcessedRow Last row processed matchedState State that reached FIN (for greedy fallback) hasAbsorbableState Whether this context can absorb other contexts allStatesAbsorbable Whether this context can be absorbed next, prev Doubly-linked list Since the NFA is nondeterministic, multiple states can coexist simultaneously within a single context. Example: In PATTERN (A | B) C, if the first row matches both A and B, two states coexist within the context: State 1: elemIdx=3 (waiting for C, via branch A) State 2: elemIdx=3 (waiting for C, via branch B) In this case, since the (elemIdx, counts) of the two states are equal, nfa_add_state_unique() retains only State 1 (branch A), which was added first. Because DFS processes the first branch of ALT first, the state via A is registered first, and the state via B is discarded as a duplicate. This is the preferment guarantee. V-3. RPR Fields of WindowAggState nfaContext / nfaContextTail Doubly-linked list of active contexts nfaContextFree Reuse pool for contexts nfaStateFree Reuse pool for states nfaVarMatched Per-row cache: varMatched[varId] nfaVisitedElems Bitmap for cycle detection nfaStateSize Precomputed size of RPRNFAState Memory management: States and contexts are managed through their own free lists. Instead of palloc, they are obtained from the reuse pool, and returned to the pool upon deallocation. This reduces the overhead of frequent allocation/deallocation. Chapter VI NFA Execution: 3-Phase Model ================================================================================ VI-1. Entry Point and Overall Flow When the window function processes each row, row_is_in_reduced_frame() is called. This function determines whether the current row belongs to a matched frame, and if necessary, calls update_reduced_frame() to drive the NFA. Flow of update_reduced_frame(): (1) Find or create a context for the target row (2) Enter the row processing loop (3) After the loop ends, record the result in reduced_frame_map Pseudocode of the row processing loop: targetCtx = nfa_get_head_context(pos) if targetCtx == NULL: targetCtx = nfa_start_context(pos) for currentPos = startPos ... if not nfa_evaluate_row(currentPos): -- row does not exist nfa_finalize_all_contexts() -- finalize all contexts break nfa_process_row(currentPos) -- 3-phase processing nfa_start_context(currentPos + 1) -- pre-create next start point nfa_cleanup_dead_contexts() -- remove dead contexts if targetCtx->states == NULL: -- target context exhausted break Key point: Processing a single row may require processing multiple rows ahead. Due to the nature of window functions, determining the frame for row N requires looking at rows beyond N. VI-2. Context Creation: nfa_start_context() Creates a new context and performs the initial advance. (1) Allocate context via nfa_context_alloc() (2) Set matchStartRow = pos (3) Create initial state: elemIdx=0 (first pattern element), counts=all zero (4) Call nfa_advance(initialAdvance=true) The initial advance expands epsilon transitions at the beginning of the pattern. For example, the initial advance for PATTERN ((A | B) C): Start: elemIdx=0 (ALT) -> Expand ALT branches -> elemIdx=1 (A) -- VAR, so add state; stop here -> elemIdx=2 (B) -- VAR, so add state; stop here Result: Two states in the context {waiting for A, waiting for B} During the initial advance, reaching FIN is not recorded as a match. This is to prevent empty matches (zero-length matches). VI-3. Row Evaluation: nfa_evaluate_row() Evaluates all variable conditions in the DEFINE clause at once for the current row. for each defineClause[i]: result = ExecEvalExpr(defineClause[i]) varMatched[i] = (not null and true) To support row navigation operators such as PREV() and NEXT(), the previous row, current row, and next row are set in separate slots: ecxt_scantuple = previous row (for PREV reference) ecxt_outertuple = current row (default reference) ecxt_innertuple = next row (for NEXT reference) The varMatched array is referenced later in Phase 1 (Match). VI-4. nfa_process_row(): 3-Phase Processing NFA processing for a single row is divided into three phases: +--------------------------------------------+ | Phase 1: MATCH (convergence) | | Compare the current row against each VAR | | state. Remove states that fail to match. | | | | Phase 2: ABSORB (absorption) | | Merge duplicate contexts to prevent | | state explosion. | | | | Phase 3: ADVANCE (expansion) | | Expand epsilon transitions to prepare | | for the next row. | +--------------------------------------------+ This ordering is important: - Match executes first to "consume the current row." - Absorb executes immediately after Match, when states have been updated. - Advance executes last to prepare "states waiting for the next row." Chapter VII Phase 1: Match ================================================================================ nfa_match() iterates through each state in the context: (1) Check whether the state's elemIdx is a VAR element (2) Compare against the current row using nfa_eval_var_match() (3) Match success: increment repetition count, retain state (4) Match failure: remove state Match determination (nfa_eval_var_match): If varId is within the range of defineVariableList: Use the value of varMatched[varId] If varId exceeds the range (variable not defined in DEFINE): Unconditionally true (matches all rows) Immediate advance for simple VARs: For a VAR with min=1, max=1 where the next element is END, the Match phase processes through END immediately. This is necessary for accurate state comparison in Phase 2 (Absorb). Example: In PATTERN ((A B)+), when A matches, it immediately advances to B, and when B matches, it immediately advances through END to complete the group count. This enables absorption comparison with other contexts. Chapter VIII Phase 2: Absorb (Context Absorption) ================================================================================ VIII-1. Problem In the current implementation, a new context is started for each row processed. Applying PATTERN (A+) to 10 rows produces 10 contexts, each of which tracks state independently. If there are N rows, the total number of states becomes O(N^2): Context 1 (started at row 1): can match A up to N times Context 2 (started at row 2): can match A up to N-1 times ... Context N (started at row N): can match A 1 time VIII-2. Solution: Context Absorption Key observation: a context started earlier contains all matches of a later-started context (monotonicity principle). If Context 1 started at row 1 and matched A 5 times, the state where Context 2 (started at row 2) matched A 4 times is already contained within Context 1. Therefore Context 2 can be "absorbed" into Context 1. VIII-3. Absorption Conditions (1) The pattern is marked as isAbsorbable (see IV-5) (2) allStatesAbsorbable of the target context is true (3) An earlier context "covers" all states of the target Cover condition (nfa_states_covered): A state with the same elemIdx exists in the earlier context, and the count at that depth is greater than or equal -- then it is covered. VIII-4. Dual-Flag Design Two boolean flags make the absorption decision efficient: hasAbsorbableState (monotonic: only true->false transition possible) "Does this context have the ability to absorb other contexts?" true if at least one absorbable state exists. Transitions to false when states are removed leaving no absorbable states. Once false, it never becomes true again. allStatesAbsorbable (dynamic: can fluctuate) "Can this context be absorbed?" true if all states are in an absorbable region. Becomes false when a non-absorbable state is added; reverts to true when it is removed. VIII-5. Absorption Order nfa_absorb_contexts() traverses from tail (newest) to head (oldest). for ctx = tail to head: if ctx.allStatesAbsorbable: for older = ctx.prev to head: if older.hasAbsorbableState: if nfa_states_covered(older, ctx): free(ctx) -- absorbed break Since inspection starts from the newest context, the most recently started (= having the shortest match) context is absorbed first. Chapter IX Phase 3: Advance (Epsilon Transition Expansion) ================================================================================ IX-1. Overview nfa_advance() expands epsilon transitions from each state after Match, generating "new states waiting for the next row." An epsilon transition is a transition that moves without consuming a row: - ALT: branch to each alternative - BEGIN: enter group (or skip if min=0) - END: loop-back within group (or exit when condition is met) - FIN: record match completion - VAR loop/exit: repeat/exit according to the quantifier Expansion stops upon reaching a VAR element, and the state is added. This is because VAR is the element that "will consume the next row." IX-2. Processing Order: DFS and Preferment advance processes states in lexicographic order, performing Depth-First Search (DFS) on each state. This DFS order is what guarantees the SQL standard's "preferment": The branch that appears first in the PATTERN text takes precedence. Example: PATTERN (A | B) C The first branch A of the ALT takes precedence over the second branch B. When both A and B can match, the match via A is selected. nfa_add_state_unique() prevents duplicate addition of the same state, so the state added first (= from the preferred branch) is retained. IX-3. Routing Function: nfa_route_to_elem() All inter-element transitions in the advance phase go through nfa_route_to_elem(). This function branches its behavior based on the type of the next element: If the next element is VAR: (1) Add the state to the context (nfa_add_state_unique) (2) If the VAR has min=0, also add a skip path (recurse via next) -> Expansion stops here (VAR is the element that "will consume the next row") If the next element is non-VAR (ALT, BEGIN, END, FIN): -> Recursively call nfa_advance_state() to continue expansion With this structure, advance recursively follows epsilon transitions until reaching a VAR, consistently stopping only at VAR elements. IX-4. Per-Element advance Behavior (a) ALT (nfa_advance_alt) Upon encountering an ALT element, all branches are expanded in order. The first element of each branch is connected via a jump pointer. idx=0 (ALT) -> branch 1 start (next) -> branch 2 start (jump) -> ... nfa_advance_state() is recursively called for each branch. (b) BEGIN (nfa_advance_begin) Handles group entry. jump points to the element after END (= first element outside the group). Greedy (default): (1) Enter the group body (move via next, reset the count at that depth) (2) If min=0, also add a group skip path (move via jump) Reluctant: Order reversed -- skip path first, group entry second. If the skip path reaches FIN, the group entry path is not generated (shortest match preferred). (c) END (nfa_advance_end) Handles group termination. This is the core of the repetition logic. Let count be the count at the current depth: count < min: Loop-back (move via jump, repeat the group body) If the RPR_ELEM_EMPTY_LOOP flag is set: In addition to loop-back, also add a fast-forward exit path. This is because the body may produce an empty match, causing count to never reach min. fast-forward resets counts[depth] to 0 and exits via next (treating the remaining required iterations as empty matches). min <= count < max: Greedy: loop-back first, exit second Reluctant: exit first, loop-back second If the exit path reaches FIN, loop-back is omitted. count >= max: Unconditional exit (move via next) On exit: reset counts[depth] = 0, and if the next element is an outer END, increment the count at the outer depth. (d) VAR (nfa_advance_var) Handles repeat/exit for a VAR element with a quantifier. Let count be the count at the current depth: count < min: Unconditional loop (stay at the same elemIdx, wait for the next row) min <= count < max: Greedy: loop first, exit (next) second Reluctant: exit first, loop second If the exit path reaches FIN, loop is omitted. count >= max: Unconditional exit (move via next) On exit: reset counts[depth] = 0. (e) FIN Match success. The current state is moved to matchedState for recording, and matchEndRow is set to the current row. Upon reaching FIN, all remaining unprocessed states are removed (early termination). By DFS order, the path that reached FIN first has the highest preferment, so the rest are inferior paths. This is the core mechanism that guarantees preferment. In SKIP PAST LAST ROW mode, upon reaching FIN, subsequent contexts that started within the match range are immediately pruned. IX-5. State Deduplication: nfa_add_state_unique() When adding a new state to a context, it is compared against existing states; if an identical state already exists, it is not added. Comparison criteria: elemIdx + counts[0..elem->depth] (see V-1) This deduplication is the core mechanism that suppresses NFA state explosion. Because DFS order causes preferred-branch states to be added first, identical states from lower-priority branches are automatically discarded. IX-6. Cycle Detection: nfaVisitedElems When a group body can produce an empty match (zero consumption), looping back from END may cause an infinite loop. Example: PATTERN ((A?)*) A? has min=0, so it can pass through without matching. If the outer group repeats: BEGIN -> A? skip -> END -> BEGIN -> ... To prevent this: (1) At compile time: set the RPR_ELEM_EMPTY_LOOP flag on the END of groups whose body is nullable. The runtime effect of this flag is described in IX-4(c): when count < min, a fast-forward exit path is added, resolving the deadlock where count cannot increase due to empty matches. (2) At runtime: initialize the nfaVisitedElems bitmap immediately before DFS expansion of each state within advance (once per state). During DFS, set the corresponding elemIdx bit when visiting each element. If a previously visited elemIdx is revisited, that path is terminated. Note: the bitmap tracks only elemIdx and does not consider counts. Therefore, legitimate revisits to the same elemIdx but with different counts may also be blocked. This only occurs when the group body is nullable (all paths can match empty), causing END -> loop-back -> skip -> END within a single DFS. In such cases the END element has the RPR_ELEM_EMPTY_LOOP flag, so the fast-forward exit (IX-4(c)) provides an alternative path that bypasses the cycle. Chapter X Match Result Processing ================================================================================ X-1. Reduced Frame Map RPR match results are recorded in a byte array called reduced_frame_map. One byte is allocated per row, and the value is one of the following: RF_NOT_DETERMINED (0) Not yet processed RF_FRAME_HEAD (1) Start row of the match RF_SKIPPED (2) Interior row of the match (skipped in frame) RF_UNMATCHED (3) Match failure The window function references this map to determine frame inclusion for each row. X-2. AFTER MATCH SKIP Determines the starting point for the next match attempt after a successful match: SKIP TO NEXT ROW: New match attempt begins from the row after the match start row. Overlapping matches are possible. SKIP PAST LAST ROW: New match attempt begins from the row after the match end row. Only non-overlapping matches are possible. X-3. INITIAL vs SEEK Standard definition (section 6.12): INITIAL: "is used to look for a match whose first row is R." SEEK: "is used to permit a search for the first match anywhere from R through the end of the full window frame." In either case, if there is no match, the reduced window frame is empty. The default is INITIAL. Current implementation: SEEK is not supported (the parser raises an error). Only INITIAL is supported, searching only for matches starting at each row position pos. Chapter XI Worked Example: Full Execution Trace ================================================================================ XI-1. Query SELECT company, tdate, price, first_value(price) OVER w AS start_price, last_value(price) OVER w AS end_price FROM stock WINDOW w AS ( PARTITION BY company ORDER BY tdate ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING AFTER MATCH SKIP PAST LAST ROW PATTERN (A+ B) DEFINE A AS price > PREV(price), B AS price < PREV(price) ); XI-2. Data Row# tdate price -------------------------- 0 2024-01-01 100 1 2024-01-02 110 2 2024-01-03 120 3 2024-01-04 115 4 2024-01-05 130 XI-3. Compilation Result PATTERN (A+ B) -> unchanged after optimization idx varId depth min max next jump ------------------------------------------ 0 A(0) 0 1 INF 1 -1 A+ 1 B(1) 0 1 1 2 -1 B 2 FIN 0 1 1 -1 -1 DEFINE: A -> "price > PREV(price)", B -> "price < PREV(price)" isAbsorbable = true (A+ is a simple unbounded VAR) XI-4. Execution Trace --- Row 0 (price=100) --- update_reduced_frame(0) called. Context C0 created (matchStartRow=0). Initial advance: elemIdx=0(A) -> VAR, so state is added. C0.states = [{elemIdx=0, counts=[0]}] nfa_evaluate_row(0): A: price(100) > PREV(price) -> no PREV -> false B: price(100) < PREV(price) -> no PREV -> false varMatched = [false, false] nfa_process_row(0): Phase 1 (Match): A(0) state vs varMatched[0]=false -> state removed C0.states = [] (empty) Phase 2 (Absorb): skipped (no states) Phase 3 (Advance): skipped (no states) C0.states is empty, so the loop terminates. matchEndRow < matchStartRow -> RF_UNMATCHED. register_reduced_frame_map(0, RF_UNMATCHED). --- Row 1 (price=110) --- update_reduced_frame(1) called. Context C1 created (matchStartRow=1). Initial advance: C1.states = [{elemIdx=0, counts=[0]}] nfa_evaluate_row(1): A: 110 > PREV(100) -> true B: 110 < PREV(100) -> false varMatched = [true, false] nfa_process_row(1): Phase 1 (Match): A(0) match succeeds -> counts[0]++ -> counts=[1] C1.states = [{elemIdx=0, counts=[1]}] Phase 3 (Advance): State {elemIdx=0, counts=[1]}: A+ (min=1, count=1, max=INF) count >= min, so: Greedy -> loop first: keep {elemIdx=0, counts=[1]} exit: reset counts[0]=0, next(=1) -> {elemIdx=1, counts=[0]} C1.states = [{elemIdx=0, counts=[1]}, {elemIdx=1, counts=[0]}] --- Row 2 (price=120) --- Context C2 created (matchStartRow=2). Initial advance: C2.states = [{elemIdx=0, counts=[0]}] nfa_evaluate_row(2): A: 120 > PREV(110) -> true B: 120 < PREV(110) -> false varMatched = [true, false] C1 nfa_process_row(2): Phase 1 (Match): {elemIdx=0, counts=[1]}: A matches -> counts=[2] {elemIdx=1, counts=[0]}: B does not match -> removed C1.states = [{elemIdx=0, counts=[2]}] C2 nfa_process_row(2): Phase 1 (Match): {elemIdx=0, counts=[0]}: A matches -> counts=[1] C2.states = [{elemIdx=0, counts=[1]}] Phase 2 (Absorb): Does C1 (started earlier) cover C2? C1: {elemIdx=0, counts=[2]}, C2: {elemIdx=0, counts=[1]} Same elemIdx, C1.counts >= C2.counts -> covered C2 absorbed. -> removed. Phase 3 (Advance): {elemIdx=0, counts=[2]}: Greedy -> loop + exit Loop: {elemIdx=0, counts=[2]} Exit: reset counts[0]=0, next(=1) -> {elemIdx=1, counts=[0]} C1.states = [{elemIdx=0, counts=[2]}, {elemIdx=1, counts=[0]}] Context C3 created (matchStartRow=3). --- Row 3 (price=115) --- nfa_evaluate_row(3): A: 115 > PREV(120) -> false B: 115 < PREV(120) -> true varMatched = [false, true] nfa_process_row(3): Phase 1 (Match): {elemIdx=0, counts=[2]}: A does not match -> removed {elemIdx=1, counts=[0]}: B matches -> counts=[1] C1.states = [{elemIdx=1, counts=[1]}] Phase 3 (Advance): {elemIdx=1, counts=[1]}: B (min=1, max=1) count(1) >= max(1) -> unconditional exit Reset counts[0]=0, next = 2 (FIN) FIN reached -> matchEndRow = 3, matchedState recorded. Early termination: no remaining states, so completed immediately. C1.states = [] (empty after reaching FIN) C1.states is empty and matchEndRow=3 >= matchStartRow=1 -> match succeeds. register_reduced_frame_map(1, RF_FRAME_HEAD) register_reduced_frame_map(2, RF_SKIPPED) register_reduced_frame_map(3, RF_SKIPPED) --- Row 4 (price=130) --- update_reduced_frame(4) called. C3 was already created but matchStartRow=3, so it is not applicable. New context C4 created (matchStartRow=4). nfa_evaluate_row(4): A: 130 > PREV(115) -> true B: 130 < PREV(115) -> false ... No subsequent rows, so nfa_finalize_all_contexts() is called. Match incomplete -> RF_UNMATCHED. XI-5. Final Result Row 0: RF_UNMATCHED -> frame = the row itself Row 1: RF_FRAME_HEAD -> frame = rows 1 through 3 Row 2: RF_SKIPPED -> inside row 1's match Row 3: RF_SKIPPED -> inside row 1's match Row 4: RF_UNMATCHED -> frame = the row itself Chapter XII Summary of Key Design Decisions ================================================================================ XII-1. Flat Array vs Tree-Based NFA Choice: Flat array (RPRPatternElement[]) Rationale: - Cache-friendly: 16-byte fixed size, contiguous memory - Index-based references: 2-byte indices instead of pointers - Easy to serialize: can use memcpy when passing to plan nodes XII-2. Forward-only Execution vs Backtracking Choice: Forward-only (state set tracking) Rationale: - Backtracking takes exponential time in the worst case - NFA simulation guarantees polynomial time - DFS order naturally guarantees preferment. Greedy/reluctant per quantifier requires only reversing the DFS order - Window functions receive sorted rows sequentially. Forward-only fits directly into this pipeline, whereas backtracking requires re-fetching previous rows - DEFINE conditions are SQL expressions (PREV, RUNNING aggregates, etc.) with high re-evaluation cost. Forward-only requires only one evaluation per row XII-3. Per-Context Management Choice: Independent context per start row Rationale: - Supports overlapping matches under SKIP TO NEXT ROW - Determines the frame for each row independently - Absorption optimization can eliminate redundant contexts in O(n) XII-4. Memory Pool Management Choice: Custom free list Rationale: - NFA states are created and destroyed in large numbers per row - Avoids palloc/pfree overhead - State size is variable (counts[] array), but within a single query maxDepth is fixed, so all states have the same size XII-5. Execution Optimization Summary The following optimizations make the NFA simulation practical. -- Compile-time -- (1) AST Optimization (IV-3) Simplifies the AST before converting the pattern to an NFA. Reduces the number of NFA elements through consecutive variable merging (A A -> A{2}), SEQ flattening, quantifier multiplication, and other transformations. Significance: Reducing the element count directly shrinks the state space, decreasing the cost of all subsequent runtime phases (match, absorb, advance). -- Runtime: advance phase -- (2) Group Skip (IX-4(b)) At the BEGIN of a group with min=0, uses jump to skip the entire group. Moves directly to the first element outside the group without exploring the group body. Greedy enters then skips; Reluctant skips then enters. Significance: For optional groups (min=0), immediately generates a skip path without exploring the body, avoiding unnecessary DFS expansion. (3) State Deduplication (IX-5) During advance, DFS may generate states with the same (elemIdx, counts) combination through multiple paths. Additionally, unlike VAR repetition, group repetition cannot perform absorption comparison using VAR states, so inline advance is performed from after Phase 1 match through to END; this process can also produce duplicate states reaching the same END. nfa_add_state_unique() blocks duplicate addition of identical states in both cases. Significance: Prevents exponential growth of the state count in ALT branches and quantifier expansion. Since DFS order causes the preferred branch's state to be registered first, identical states from lower-priority branches are automatically discarded, thereby also guaranteeing preferment. (4) Cycle Detection and Fast-Forward (IX-6, IX-4(c)) When a nullable group body (e.g., A?) repeats empty matches, the END -> BEGIN loop-back can continue indefinitely. Two mechanisms resolve this: - A visited bitmap (nfaVisitedElems) blocks revisitation of the same element, preventing infinite loops (safety) - At an END with the RPR_ELEM_EMPTY_LOOP flag set, when count < min, the remaining required iterations are treated as empty matches and a fast-forward exit path out of the group is added (correctness) Significance: Cycle detection guarantees termination, and fast-forward guarantees that the min condition is satisfied. Without these, patterns containing nullable groups would fall into infinite loops or fail to match. (5) Match Pruning (IX-4(e)) When a state reaches FIN during advance, all remaining unprocessed states of that context are removed. Because of DFS order, the path that reaches FIN first has the highest preferment, so the remaining paths are inferior. Significance: Once the best match is determined, exploration of inferior paths is immediately terminated. This mechanism achieves both preferment guarantees and performance optimization. -- Runtime: inter-context -- (6) Early Termination (SKIP PAST LAST ROW) In SKIP PAST LAST ROW mode, when a match is found, subsequent contexts whose start rows fall within the match range are pruned immediately without further processing. In SKIP TO NEXT ROW mode, overlapping contexts are preserved because each row requires its own independent match. Significance: Prunes subsequent contexts whose start rows overlap with a prior match range, avoiding unnecessary processing. (7) Context Absorption (Chapter VIII) If an independent context is created for each row, O(n^2) states accumulate. By exploiting the monotonicity that an earlier-started context subsumes the states of a later-started context, redundant contexts are eliminated early. Absorbability is determined per-element; comparison is performed only at elements with the RPR_ELEM_ABSORBABLE flag (see IV-5). Significance: Keeps the number of active contexts at a constant level, achieving O(n^2) -> O(n) time complexity. Without this, performance degrades sharply on long partitions. Appendix A. Key Function Index ================================================================================ Function File Role ---------------------------------------------------------------------- transformRPR parse_rpr.c Parser entry point transformDefineClause parse_rpr.c DEFINE transformation collectPatternVariables rpr.c Variable collection filterDefineClause rpr.c DEFINE filtering buildRPRPattern rpr.c NFA compilation main optimizeRPRPattern rpr.c AST optimization fillRPRPattern rpr.c NFA element generation finalizeRPRPattern rpr.c Finalization computeAbsorbability rpr.c Absorption analysis update_reduced_frame nodeWindowAgg.c Execution main loop nfa_start_context nodeWindowAgg.c Context creation nfa_process_row nodeWindowAgg.c 3-phase processing nfa_evaluate_row nodeWindowAgg.c DEFINE evaluation nfa_match nodeWindowAgg.c Phase 1 nfa_absorb_contexts nodeWindowAgg.c Phase 2 nfa_advance nodeWindowAgg.c Phase 3 nfa_advance_state nodeWindowAgg.c Per-state branching nfa_route_to_elem nodeWindowAgg.c Element routing nfa_advance_alt nodeWindowAgg.c ALT handling nfa_advance_begin nodeWindowAgg.c BEGIN handling nfa_advance_end nodeWindowAgg.c END handling nfa_advance_var nodeWindowAgg.c VAR handling nfa_add_state_unique nodeWindowAgg.c Deduplication nfa_states_covered nodeWindowAgg.c Absorption coverage check Appendix B. Data Structure Relationship Diagram ================================================================================ Parser Layer -------- RPCommonSyntax |--- rpSkipTo: RPSkipTo |--- initial: bool +--- rpPattern: RPRPatternNode* (tree) |--- nodeType: VAR | SEQ | ALT | GROUP |--- min, max: quantifier |--- varName: variable name (VAR only) +--- children: List* (SEQ/ALT/GROUP only) Planner Layer ---------- WindowAgg (plan node) |--- rpSkipTo: RPSkipTo |--- defineClause: List +--- rpPattern: RPRPattern* |--- numVars: int |--- varNames: char** |--- maxDepth: RPRDepth |--- isAbsorbable: bool |--- numElements: int +--- elements: RPRPatternElement[] (flat array) |--- varId (1B) |--- depth (1B) |--- flags (1B) |--- reserved (1B) |--- min, max (4B + 4B) +--- next, jump (2B + 2B) Executor Layer ---------- WindowAggState |--- rpPattern: RPRPattern* (copied from plan) |--- defineClauseList: List |--- nfaVarMatched: bool[] (per-row cache) |--- nfaVisitedElems: bitmapword* (cycle detection) |--- nfaContext <-> nfaContextTail (doubly-linked list) | +--- RPRNFAContext | |--- states: RPRNFAState* (linked list) | | |--- elemIdx | | |--- counts[] | | +--- isAbsorbable | |--- matchStartRow, matchEndRow | |--- matchedState (cloned on FIN arrival) | |--- hasAbsorbableState | +--- allStatesAbsorbable |--- nfaContextFree (recycling pool) +--- nfaStateFree (recycling pool) Appendix C. NFA Element Array Examples ================================================================================ C-1. PATTERN (A B C) idx varId depth min max next jump ------------------------------------------ 0 A 0 1 1 1 -1 1 B 0 1 1 2 -1 2 C 0 1 1 3 -1 3 FIN 0 1 1 -1 -1 C-2. PATTERN (A+ B*) idx varId depth min max next jump flags ------------------------------------------------- 0 A 0 1 INF 1 -1 ABSORBABLE | ABSORBABLE_BRANCH 1 B 0 0 INF 2 -1 2 FIN 0 1 1 -1 -1 Only A+ is the absorption point (Case 1). Once past A, absorption is permanently disabled for that state. C-3. PATTERN (A | B | C) idx varId depth min max next jump ------------------------------------------ 0 ALT 0 1 1 1 -1 alternation start 1 A 1 1 1 4 2 branch 1 -> FIN, jump -> branch 2 2 B 1 1 1 4 3 branch 2 -> FIN, jump -> branch 3 3 C 1 1 1 4 -1 branch 3 -> FIN 4 FIN 0 1 1 -1 -1 C-4. PATTERN ((A B)+ C) idx varId depth min max next jump flags ----------------------------------------------------------- 0 BEGIN 0 1 INF 1 4 ABSORBABLE_BRANCH 1 A 1 1 1 2 -1 ABSORBABLE_BRANCH 2 B 1 1 1 3 -1 ABSORBABLE_BRANCH 3 END 0 1 INF 4 1 ABSORBABLE | ABSORBABLE_BRANCH 4 C 0 1 1 5 -1 5 FIN 0 1 1 -1 -1 Case 2: GROUP+ with {1,1} body VARs. A, B are branches; END is the absorption point. Compare with C-6 (Case 3). C-5. PATTERN ((A | B)+? C) idx varId depth min max next jump flags --------------------------------------------------- 0 BEGIN 0 1 INF 1 5 RELUCTANT, group start 1 ALT 1 1 1 2 -1 alternation start 2 A 2 1 1 4 3 branch 1 3 B 2 1 1 4 -1 branch 2 4 END 0 1 INF 5 1 RELUCTANT, group end 5 C 0 1 1 6 -1 6 FIN 0 1 1 -1 -1 C-6. PATTERN ((A+ B)+ C) -- Absorbability flag example idx varId depth min max next jump flags ---------------------------------------------------------- 0 BEGIN 0 1 INF 1 4 ABSORBABLE_BRANCH, group start 1 A 1 1 INF 2 -1 ABSORBABLE | ABSORBABLE_BRANCH 2 B 1 1 1 3 -1 3 END 0 1 INF 4 1 group end 4 C 0 1 1 5 -1 5 FIN 0 1 1 -1 -1 Recurses from BEGIN into the body -> A matches Case 1 (simple VAR+). A gets ABSORBABLE | ABSORBABLE_BRANCH, BEGIN gets ABSORBABLE_BRANCH. B and END get no flags -> absorption stops once the state advances to B. (See IV-5 Case 3) C-7. PATTERN ((A+ B | C*)+ D) -- Per-branch absorption in ALT idx varId depth min max next jump flags ---------------------------------------------------------- 0 BEGIN 0 1 INF 1 6 ABSORBABLE_BRANCH 1 ALT 1 1 1 2 -1 ABSORBABLE_BRANCH 2 A 2 1 INF 3 4 ABSORBABLE | ABSORBABLE_BRANCH 3 B 2 1 1 5 -1 4 C 2 0 INF 5 -1 ABSORBABLE | ABSORBABLE_BRANCH 5 END 0 1 INF 6 1 EMPTY_LOOP 6 D 0 1 1 7 -1 7 FIN 0 1 1 -1 -1 ALT branches are checked independently for absorbability. Branch 1: A+ matches Case 1 -> A gets ABSORBABLE. B has no flag. Branch 2: C* matches Case 1 -> C gets ABSORBABLE. Both A and C get ABSORBABLE_BRANCH as part of their respective branch paths. END has EMPTY_LOOP: branch 2 (C*) is nullable, making the group body nullable. BEGIN and ALT get ABSORBABLE_BRANCH (on the path to absorbable elements). ================================================================================ End of document ================================================================================