| From: | Henson Choi <assam258(at)gmail(dot)com> |
|---|---|
| To: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
| Cc: | vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Row pattern recognition |
| Date: | 2026-02-14 14:58:10 |
| Message-ID: | CAAAe_zB+W+xBJpYNzRLzh6BH9HrFycTjoPUnVvnU1BT_7RR8Bw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Tatsuo,
This round focused on reviewing nodeWindowAgg.c -- the NFA executor
-- which was the last piece remaining on my review list. I've also
added a comprehensive NFA runtime test suite.
The attached patch continues the incremental series on top of v42.
Here are the changes:
FIXME issues documented:
These require non-trivial structural changes, so I've documented
them as FIXME comments for now rather than attempting a fix.
- altPriority tracks only the last ALT choice, so repeated or nested
ALTs like (A|B)+ cannot correctly implement SQL standard lexical
ordering. A full-path classifier structure is needed.
- Cycle prevention condition (count == 0 && min == 0) is insufficient
for patterns like (A*)* where cycles occur at count > 0. Currently
relies on implicit duplicate detection in nfa_add_state_unique.
Bugs fixed:
- Fix nfa_advance_begin() routing order: enter group first (lexically
first), then skip path (lexically second).
- Add ALT scope boundary check in nfa_advance_alt() and
computeAbsorbabilityRecursive() to stop branch traversal when
element depth exits ALT scope.
- Disallow RANGE and GROUPS frame types with row pattern
recognition, as the SQL standard only requires ROWS.
Also remove the now-dead frame-type determination code.
NFA executor refactoring (nodeWindowAgg.c):
- Simplify nfa_process_row() phase 1: remove nfa_advance() call
after forced mismatch at frame boundary, since all states are
already at VAR positions and mismatch removes them. Replace
the redundant phase 3 frame boundary check with Assert.
- Simplify update_reduced_frame() result registration: mismatch
path now returns early, and the post-processing SKIP mode cleanup
block (nfa_remove_contexts_up_to vs nfa_context_free) is removed
since eager pruning handles SKIP PAST LAST ROW and both paths
now simply call nfa_context_free().
- Replace nfa_remove_contexts_up_to() with eager pruning inside
nfa_add_matched_state(). When matchEndRow is extended during
greedy matching, pending contexts within the match range are
pruned incrementally instead of after match completion. For
SKIP PAST LAST ROW patterns like START UP+, this reduces the
number of live contexts at each row from O(n) to O(1), avoiding
O(n^2) per-row processing of contexts that would be skipped
anyway.
- Optimize nfa_states_equal() to compare counts only up to the
current element's depth instead of the full maxDepth.
- Rename nfa_state_clone -> nfa_state_create,
nfa_find_context_for_pos -> nfa_get_head_context for clarity.
- Add nfa_record_context_success/skipped/absorbed statistics helpers.
- Remove unused RPRPattern parameter from
nfa_update_absorption_flags().
Absorbability refactoring (rpr.c):
- Remove parentDepth parameter from isUnboundedStart() and
computeAbsorbabilityRecursive(). Scope boundaries are now
determined by element depth comparison instead of an explicit
parent depth parameter.
- Simplify isUnboundedStart(): check simple VAR case first, then
GROUP case with a depth-bounded loop instead of a FIN-terminated
traversal with multiple break conditions.
Test changes:
- Add rpr_nfa.sql: comprehensive NFA runtime test suite covering
quantifier boundaries (min/max/exact), alternation priority, nested
patterns, frame boundary variations, INITIAL mode (syntax error
expected), pathological patterns, context absorption, and FIXME
reproduction cases.
- Add nth_value out-of-frame tests, ReScan/LATERAL test, and
nth_value IGNORE NULLS test to rpr.sql.
- Change selected EXPLAIN test queries in rpr_explain.sql from
EXPLAIN (COSTS OFF) to EXPLAIN (ANALYZE, ...) to verify actual
NFA execution statistics.
- Fix stale comments across rpr.sql, rpr_base.sql, and rpr_nfa.sql:
remove resolved BUG annotations, update error messages to match
actual output, correct optimization result descriptions, and
standardize Expected comment placement to after SQL statements.
Other changes:
- Run pgindent on RPR source files. Add /*----------*/ comment
guards to protect structured comments from reformatting.
Coverage:
I ran gcov on the modified lines (diff-only coverage). The attached
coverage.zip contains an HTML report. Summary:
- 18 files, 124 functions, 2238 modified lines analyzed
- Overall: 92.1% line coverage (2061/2238)
- Core files (nodeWindowAgg.c, rpr.c, explain.c, parse_rpr.c):
98.0-98.5% each
- Node serialization (outfuncs.c, readfuncs.c, equalfuncs.c):
0% -- these implement RPRPattern serialization for plan caching,
but regression tests don't exercise prepared statements with RPR
- ruleutils.c: 96.3% -- untested lines are the reluctant quantifier
display path ({1}?), which is currently rejected at parse time
The node serialization functions (141 lines, 0% coverage) are the
largest untested area. I'm not sure how to trigger these paths
in the regression test framework. Any suggestions?
I'll send a separate email within a few days listing the FIXME
issues and other unresolved items from the mailing list discussion
for your review.
Best regards,
Henson
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-Fix-non-ASCII-characters-in-RPR-code-and-comments.txt | text/plain | 13.5 KB |
| 0002-Initialize-NFA-per-partition-counters-in-ExecInitWindowAgg.txt | text/plain | 892 bytes |
| 0005-Disallow-RANGE-and-GROUPS-frame-types-with-RPR.txt | text/plain | 4.7 KB |
| 0003-Remove-FIXME-and-normalize-Storage-values.txt | text/plain | 175.1 KB |
| 0004-Fix-RPR-pattern-compilation-crash-and-refactor-EXPLAIN-deparse.txt | text/plain | 256.9 KB |
| 0007-Run-pgindent-on-RPR-source-files.txt | text/plain | 4.7 KB |
| 0006-Review-NFA-executor-and-add-comprehensive-runtime-tests.txt | text/plain | 281.7 KB |
| 0008-Fix-RPR-documentation-correct-depth-limit-and-EXPLAI.txt | text/plain | 2.5 KB |
| coverage.zip | application/zip | 233.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | KAZAR Ayoub | 2026-02-14 15:02:21 | Re: Speed up COPY TO text/CSV parsing using SIMD |
| Previous Message | Tatsuo Ishii | 2026-02-14 10:20:33 | Re: Questionable description about character sets |