| From: | Henson Choi <assam258(at)gmail(dot)com> |
|---|---|
| To: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
| Cc: | david(dot)g(dot)johnston(at)gmail(dot)com, vik(at)postgresfriends(dot)org, jacob(dot)champion(at)enterprisedb(dot)com, er(at)xs4all(dot)nl, peter(at)eisentraut(dot)org, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Row pattern recognition |
| Date: | 2026-01-02 02:30:31 |
| Message-ID: | CAAAe_zBESHXy0zvCHvYQvFtyk98dtyJAhsp3R6HvyXdP-n4-Qw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Ishii-san,
Glad the review was useful!
2026년 1월 2일 (금) AM 11:16, Tatsuo Ishii <ishii(at)postgresql(dot)org>님이 작성:
> Hi Henson,
>
> Thank you for the review! I will take care the issues (it will take
> sometime).
>
>
If you have other patches that need pre-analysis (coverage, style,
valgrind, etc.), feel free to ask anytime. I enjoy this kind of work.
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS K.K.
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
>
> > SQL/RPR Patch Review Report
> > ============================
> >
> > Patch: Row Pattern Recognition (SQL:2016)
> > Commitfest: https://commitfest.postgresql.org/patch/4460
> >
> > Review Methodology:
> > This review focused on quality assessment, not line-by-line code audit.
> > Key code paths and quality issues were examined with surrounding
> context
> > when concerns arose. Documentation files were reviewed with AI-assisted
> > grammar and typo checking. Code coverage was measured using gcov and
> > custom analysis tools.
> >
> > Limitations:
> > - Not a security audit or formal verification
> > - Parser and executor internals reviewed at module level, not
> exhaustively
> > - Performance characteristics not benchmarked
> >
> >
> > TABLE OF CONTENTS
> > -----------------
> >
> > 1. Executive Summary
> > 2. Issues Found
> > 2.1 Critical / Major
> > 2.2 Minor Issues (Review Needed)
> > 2.3 Minor Issues (Style)
> > 2.4 Suggestions for Discussion
> > 3. Test Coverage
> > 3.1 Covered Areas
> > 3.2 Untested Items
> > 3.3 Unimplemented Features (No Test Needed)
> > 4. Code Coverage Analysis
> > 4.1 Overall Coverage
> > 4.2 Coverage by File
> > 4.3 Uncovered Code Risk Assessment
> > 5. Commit Analysis
> > 6. Recommendations
> > 7. Conclusion
> >
> >
> > 1. EXECUTIVE SUMMARY
> > --------------------
> >
> > Overall assessment: GOOD
> >
> > The SQL/RPR patch demonstrates solid implementation quality within its
> > defined scope. Code follows PostgreSQL coding standards (with minor style
> > issues), test coverage is comprehensive at 96.4%, and documentation is
> > thorough with only minor typos.
> >
> > Rating by Area:
> > - Code Quality: Good (PostgreSQL style compliant, 26 static style
> fixes
> > recommended)
> > - Test Coverage: Good (96.4% line coverage, 28 test cases)
> > - Documentation: Good (Complete, 1 minor typo)
> > - Build/Regress: Pass (make check-world, 753 tests passed)
> >
> >
> > 2. ISSUES FOUND
> > ---------------
> >
> > 2.1 Critical / Major
> >
> > #1 [Code] Greedy pattern combinatorial explosion
> > Pattern: PATTERN (A+ B+ C+) with DEFINE A AS TRUE, B AS TRUE, C AS
> TRUE
> > Impact: Memory exhaustion or infinite wait
> > Recommendation: Add work_mem-based memory limit (error on exceed)
> >
> > Evidence - No memory limit in current code:
> > - nodeWindowAgg.c:5718-5722 string_set_init(): palloc() unconditional
> > - nodeWindowAgg.c:5741-5750 string_set_add(): set_size *= 2;
> repalloc()
> > unlimited
> > - nodeWindowAgg.c:5095-5174 generate_patterns(): Triple loop, no limit
> >
> > Only work_mem usage in RPR (nodeWindowAgg.c:1297):
> > winstate->buffer = tuplestore_begin_heap(false, false, work_mem);
> > -> For tuple buffer, unrelated to pattern combination memory
> (StringSet)
> >
> > 2.2 Minor Issues (Review Needed)
> >
> > #1 [Code] nodeWindowAgg.c:5849,5909,5912
> > pos > NUM_ALPHABETS check - intent unclear, causes error at 28 PATTERN
> > elements
> >
> > Reproduction:
> > - PATTERN (A B C ... Z A) (27 elements) -> OK
> > - PATTERN (A B C ... Z A B) (28 elements) -> ERROR "initial is not
> valid
> > char: b"
> >
> > 2.3 Minor Issues (Style)
> >
> > #1 [Code] nodeWindowAgg.c (25 blocks)
> > #ifdef RPR_DEBUG -> recommend elog(DEBUG2, ...) or remove
> >
> > #2 [Code] src/backend/executor/nodeWindowAgg.c
> > static keyword on separate line (26 functions)
> >
> > #3 [Code] src/backend/utils/adt/ruleutils.c
> > Whitespace typo: "regexp !=NULL" -> "regexp != NULL"
> >
> > #4 [Code] nodeWindowAgg.c:4619
> > Error message case: "Unrecognized" -> "unrecognized" (PostgreSQL
> style)
> >
> > #5 [Doc] doc/src/sgml/ref/select.sgml:1128
> > Typo: "maximu" -> "maximum"
> >
> > 2.4 Suggestions for Discussion
> >
> > #1 Incremental matching with streaming NFA redesign
> > Benefits:
> > - O(n) time complexity per row (current: exponential in worst case)
> > - Bounded memory via state merging and context absorption
> > - Natural extension for OR patterns, {n,m} quantifiers, nested groups
> > - Enables MEASURES clause with incremental aggregates
> > - Proven approach in CEP engines (Flink, Esper)
> >
> >
> > 3. TEST COVERAGE
> > ----------------
> >
> > 3.1 Covered Areas
> >
> > - PATTERN clause: +, *, ? quantifiers (line 41, 71, 86)
> > - DEFINE clause: Variable conditions, auto-TRUE for missing (line
> 120-133)
> > - PREV/NEXT functions: Single argument (line 44, 173)
> > - AFTER MATCH SKIP: TO NEXT ROW (line 182), PAST LAST ROW (line 198)
> > - Aggregate integration: sum, avg, count, max, min (line 258-277)
> > - Window function integration: first_value, last_value, nth_value (line
> > 34-35)
> > - JOIN/CTE: JOIN (line 313), WITH (line 324)
> > - View: VIEW creation, pg_get_viewdef (line 390-406)
> > - Error cases: 7 error scenarios (line 409-532)
> > - Large partition: 5000 row smoke test (line 360-387)
> > - ROWS BETWEEN offset: CURRENT ROW AND 2 FOLLOWING (line 244)
> >
> > 3.2 Untested Items
> >
> > Feature Priority Recommendation
> >
> -------------------------------------------------------------------------------
> > 26 variable limit Medium Test 26 variables success, 27th
> > variable error
> > NULL value handling Low Test PREV(col) where col or previous
> > row is NULL
> >
> > Sample test for 26 variable limit:
> >
> > -- Should fail with 27th variable (parser error, no table needed)
> > SELECT * FROM (SELECT 1 AS x) t
> > WINDOW w AS (
> > ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> > PATTERN (a b c d e f g h i j k l m n o p q r s t u v w x y z aa)
> > DEFINE a AS TRUE, b AS TRUE, c AS TRUE, d AS TRUE, e AS TRUE, f AS
> > TRUE,
> > g AS TRUE, h AS TRUE, i AS TRUE, j AS TRUE, k AS TRUE, l AS
> > TRUE,
> > m AS TRUE, n AS TRUE, o AS TRUE, p AS TRUE, q AS TRUE, r AS
> > TRUE,
> > s AS TRUE, t AS TRUE, u AS TRUE, v AS TRUE, w AS TRUE, x AS
> > TRUE,
> > y AS TRUE, z AS TRUE, aa AS TRUE
> > );
> > -- ERROR: number of row pattern definition variable names exceeds 26
> >
> > Sample test for NULL handling:
> >
> > CREATE TEMP TABLE stock_null (company TEXT, tdate DATE, price
> INTEGER);
> > INSERT INTO stock_null VALUES ('c1', '2023-07-01', 100);
> > INSERT INTO stock_null VALUES ('c1', '2023-07-02', NULL); -- NULL in
> > middle
> > INSERT INTO stock_null VALUES ('c1', '2023-07-03', 200);
> > INSERT INTO stock_null VALUES ('c1', '2023-07-04', 150);
> >
> > SELECT company, tdate, price, count(*) OVER w AS match_count
> > FROM stock_null
> > WINDOW w AS (
> > PARTITION BY company
> > ORDER BY tdate
> > ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> > PATTERN (START UP DOWN)
> > DEFINE START AS TRUE, UP AS price > PREV(price), DOWN AS price <
> > PREV(price)
> > );
> > -- Result: all rows show match_count = 0 (NULL breaks pattern
> matching)
> >
> > 3.3 Unimplemented Features (No Test Needed)
> >
> > Per documentation (select.sgml:1133-1136):
> > - MEASURES clause: Not implemented
> > - SUBSET clause: Not implemented
> > - AFTER MATCH variants: Only SKIP TO NEXT ROW, PAST LAST ROW supported
> >
> > Not in documentation, verified by testing:
> > - {n,m} quantifier: Parser error ("syntax error at or near {")
> > - (A|B) OR pattern: Not supported (parsed as single variable name "a|b")
> > - (A B)+ compound repetition: Parser accepts, but execution returns 0
> > matches
> >
> >
> > 4. CODE COVERAGE ANALYSIS
> > -------------------------
> >
> > 4.1 Overall Coverage
> >
> > 96.4% (732 / 759 lines)
> >
> > 4.2 Coverage by File (major RPR-modified files)
> >
> > nodeWindowAgg.c: 96.6% (560/580) - Pattern matching execution engine
> > parse_clause.c: 96.7% (88/91) - PATTERN/DEFINE analysis
> > ruleutils.c: 93.1% (54/58) - pg_get_viewdef output
> >
> > 4.3 Uncovered Code Risk Assessment
> >
> > Total: 27 lines uncovered
> >
> > Medium Risk (2 items) - Test addition recommended (see section 3.2):
> > - parse_clause.c:4093: transformDefineClause - 27th variable error
> > - nodeWindowAgg.c:5623: get_slots - null_slot for PREV at partition first
> > row
> >
> > Low Risk (25 items) - Defensive code / Unreachable:
> > - nodeWindowAgg.c:3074: attno_map_walker - Parser validates arg count
> > - nodeWindowAgg.c:3137-3138: rpr_navigation_walker - switch default
> > - nodeWindowAgg.c:3566: window_gettupleslot - Before mark position
> > - nodeWindowAgg.c:4289: WinGetFuncArgInFrame - isout flag
> > - nodeWindowAgg.c:4393: WinGetSlotInFrame - Boundary check
> > - nodeWindowAgg.c:4618-4619: row_is_in_reduced_frame - switch default
> > - nodeWindowAgg.c:4697: register_reduced_frame_map - pos < 0 check
> > - nodeWindowAgg.c:5007: search_str_set - NULL return continue
> > - nodeWindowAgg.c:5405: do_pattern_match - Regex compile error
> > - nodeWindowAgg.c:5435,5437-5438: do_pattern_match - Regex exec error
> > - nodeWindowAgg.c:5700: pattern_initial - Variable not found
> > - nodeWindowAgg.c:5776: string_set_get - Index range check
> > - nodeWindowAgg.c:5850: variable_pos_register - a-z range check
> > - nodeWindowAgg.c:5910: variable_pos_fetch - a-z range check
> > - nodeWindowAgg.c:5913: variable_pos_fetch - Index range check
> > - parse_clause.c:3989: transformDefineClause - A_Expr type check
> > - parse_clause.c:4145: transformPatternClause - A_Expr type check
> > - ruleutils.c:6904-6908: get_rule_windowspec - SKIP TO NEXT ROW output
> >
> > Conclusion: Most uncovered code consists of defensive error handling or
> > code unreachable due to parser pre-validation. No security or functional
> > risk.
> >
> >
> > 5. COMMIT ANALYSIS
> > ------------------
> >
> > 8 sequential commits:
> >
> > Commit Area Files +/- Key Content
> >
> -------------------------------------------------------------------------------
> > 1 Raw Parser 4 +174/-16 gram.y grammar
> (PATTERN/DEFINE)
> > 2 Parse/Analysis 4 +277/-1 parse_clause.c analysis logic
> > 3 Rewriter 1 +109/-0 pg_get_viewdef extension
> > 4 Planner 5 +73/-3 WindowAgg plan node extension
> > 5 Executor 4 +1,942/-11 CORE: Pattern matching engine
> > (+1,850)
> > 6 Docs 3 +192/-7 advanced.sgml,
> func-window.sgml
> > 7 Tests 3 +1,585/-1 rpr.sql (532), rpr.out
> (1,052)
> > 8 typedefs 1 +6/-0 pgindent support
> >
> > Code Change Statistics:
> > - Total files: 25
> > - Lines added: 4,358
> > - Lines deleted: 39
> > - Net increase: +4,319 lines
> >
> >
> > 6. RECOMMENDATIONS
> > ------------------
> >
> > 6.1 Combinatorial Explosion Prevention (Major, Required)
> >
> > Add work_mem-based memory limit for StringSet allocation.
> > Location: string_set_add() in nodeWindowAgg.c:5741-5750
> > Consistent with existing PostgreSQL approach (Hash Join, Sort, etc.)
> >
> > 6.2 Code Review Required (Minor, Decision Needed)
> >
> > Location: nodeWindowAgg.c:5849,5909,5912
> > Issue: pos > NUM_ALPHABETS check intent unclear
> > Current: PATTERN with 28 elements causes error
> > Question: Is 27 element limit intentional?
> >
> > 6.3 Code Style Fixes (Minor)
> >
> > - #ifdef RPR_DEBUG: Use elog(DEBUG2, ...) or remove (25 blocks)
> > - static keyword on separate line: 26 functions to fix
> > - Whitespace: "regexp !=NULL" -> "regexp != NULL"
> > - Error message case: "Unrecognized" -> "unrecognized"
> >
> > 6.4 Documentation Fixes (Minor)
> >
> > - select.sgml: "maximu" -> "maximum"
> >
> > 6.5 Test Additions (Optional)
> >
> > Black-box Tests (Functional):
> >
> > Feature Test Case
> Priority
> >
> -------------------------------------------------------------------------------
> > Variable limit 26 variables success, 27 error Medium
> > NULL boundary PREV at partition first row Medium
> >
> > White-box Tests (Coverage) - covered by above black-box tests:
> >
> > File:Line Target Branch
> >
> -------------------------------------------------------------------------------
> > parse_clause.c:4093 Limit error branch (Variable limit
> test)
> > nodeWindowAgg.c:5623 null_slot usage (NULL boundary test)
> >
> >
> > 7. CONCLUSION
> > -------------
> >
> > Test Quality: GOOD
> >
> > Core functionality is thoroughly tested with comprehensive error case
> > coverage.
> >
> > The patch is well-implemented within its defined scope. Identified issues
> > include
> > one major concern (combinatorial explosion) and minor style/documentation
> > items.
> >
> > Recommended actions before commit:
> > 1. Add work_mem-based memory limit for pattern combinations (required)
> > 2. Clarify pos > NUM_ALPHABETS check intent (review needed)
> > 3. Fix code style issues (optional but recommended)
> > 4. Fix documentation typo (trivial)
> > 5. Add tests for variable limit and NULL handling (optional)
> >
> > Points for discussion (optional):
> > 6. Incremental matching with streaming NFA redesign
> >
> >
> > Attachment:
> > - coverage.tgz (gcov HTML report, RPR-modified code only)
> >
> >
> > ---
> > End of Report
> >
> > 2025년 9월 24일 (수) PM 7:36, Tatsuo Ishii <ishii(at)postgresql(dot)org>님이 작성:
> >
> >> Attached are the v33 patches for Row pattern recognition. The
> >> difference from v32 is that the raw parse tree printing patch is not
> >> included in v33. This is because now that we have
> >> debug_print_raw_parse.
> >>
> >> Best regards,
> >> --
> >> Tatsuo Ishii
> >> SRA OSS K.K.
> >> English: http://www.sraoss.co.jp/index_en/
> >> Japanese:http://www.sraoss.co.jp
> >>
>
Best regards,
Henson
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Robert Treat | 2026-01-02 02:54:06 | Re: DOC: fixes multiple errors in alter table doc |
| Previous Message | Tatsuo Ishii | 2026-01-02 02:16:15 | Re: Row pattern recognition |