Re: Row pattern recognition

From: Henson Choi <assam258(at)gmail(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, peter(at)eisentraut(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2026-02-08 16:05:10
Message-ID: CAAAe_zBpgH_06Utd9r1rVYDLaF-UuTtT1_W69bVeEP4MBzvDBQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Sorry, I forgot to attach the patch file in my previous email.

2026년 2월 7일 (토) PM 2:41, Henson Choi <assam258(at)gmail(dot)com>님이 작성:

> Hi Tatsuo,
>
> +-- Filter function to normalize Storage memory values only (not NFA
>> statistics)
>> +-- Works for text, JSON, and XML formats
>>
>> What about add reasoning why only storage memory values are normalized
>> something like:
>>
>> -- NFA statistics should not change between platforms.
>
>
> Done. I've updated the comment to:
>
> -- Filter function to normalize Storage memory values only (not NFA
> statistics).
> -- NFA statistics should not change between platforms; if they do, it could
> -- indicate issues such as uninitialized memory access.
> -- Works for text, JSON, and XML formats.
>
> This round focused primarily on reviewing explain.c. I believe we're
> getting close to the final version now. Only the nodeWindowAgg.c
> code review remains on my side.
>
> For the next submission, would it be helpful if I resend all the
> incremental patches I've already sent since v42 together as separate
> per-patch files?
>
> I've also made some updates to the documentation (advanced.sgml and
> ref/select.sgml). Could you take a look and let me know if there are
> any mistakes or if the changes look good to you?
>
> I've sent several incremental patches on top of v42, and the attached
> patch is the next one in that series. Here are the changes:
>
> Bugs fixed:
>
> - Fix server crash in fillRPRPatternAlt() when an inner ALT is the
> last element of an outer ALT's branch. Both alternations tried to
> set the 'next' pointer on the same endpoint element, triggering
> Assert(pat->elements[endPos].next == RPR_ELEMIDX_INVALID). Fix by
> redirecting all elements in the branch that share the old target to
> point to the outer ALT's afterAlt instead.
> (Test: PATTERN (C (A | B) | D) -- inner ALT at end of outer branch)
>
> - Add BEGIN element to fillRPRPatternGroup(). Previously only END was
> emitted for quantified groups, so groups with min=0 had no skip path
> to bypass the group content. Change to BEGIN/END pairs where
> BEGIN.jump points past END (skip) and END.jump points to the first
> child (loop-back). Add nfa_advance_begin() to the NFA executor for
> group entry and skip handling.
> (Test: PATTERN (A ((B | C) (D | E))* F?) -- * group matching 0 times)
>
> - Fix deparse output producing wrong parenthesization for nested ALT
> patterns. The flat-loop approach could not correctly suppress double
> parentheses for single-ALT groups or handle depth transitions around
> alternation separators. Refactor into mutual recursion:
> deparse_rpr_elements, deparse_rpr_group, deparse_rpr_alt, and
> deparse_rpr_var.
>
> - Fix deparse separator ordering for nested ALT: the ' | ' separator
> was emitted before closing inner parentheses, producing output like
> '(c (a | b | )d)' instead of '(c (a | b) | d)'. Close parens to
> match separator depth before emitting ' | '.
>
> - Fix missing ALT separator registration in deparse_rpr_alt() when an
> ALT is the first element of an outer ALT's branch. The code only
> checked elem->next (always -1 for ALT markers) but not elem->jump,
> which carries the outer branch's separator position.
>
> - Fix altEndPositions/altBranchStarts length mismatch caused by the
> 'if (*idx > branchStart)' guard that skipped empty branches. Remove
> the guard so both lists always have matching entries.
>
> - Fix RPRPatternNode.reluctant initialization in gram.y. ALT, SEQ,
> VAR, and GROUP primary productions used false (0) or left it
> uninitialized (defaulting to 0 from palloc0), but 0 is a valid
> ParseLoc meaning "location at offset 0". Change all four creation
> sites to use -1 (no location), matching the convention used by
> ParseLoc throughout the parser.
>
> - Fix reluctant quantifier display in both explain.c and ruleutils.c.
> A bare variable with reluctant ? (e.g. A?) was displayed as just 'a'
> since there was no quantifier to attach ? to. Now emit '{1}?' to
> make the reluctant marker unambiguous.
>
> New optimization:
>
> - Add mergeConsecutiveAlts() to the SEQ optimization pipeline. After
> GROUP{1,1} unwrap, bare alternations like (A | B) become ALT nodes
> in the SEQ. This step detects consecutive identical ALT nodes and
> wraps them in a GROUP with the appropriate quantifier, e.g.
> (A | B) (A | B) (A | B) -> (A | B){3}. Combined with the existing
> mergeGroupPrefixSuffix, patterns like (A | B) (A | B)+ (A | B)
> further reduce to (A | B){3,}.
>
> - Extend tryMultiplyQuantifiers() to fold nested GROUP quantifiers
> (e.g. ((A B)+)* -> (A B)*) in addition to VAR quantifiers.
>
> Other changes:
>
> - Add RPR_VARID_BEGIN (252) to rpr.h for the new control element type.
> Reduce RPR_VARID_MAX from 252 to 251 accordingly and update the
> maximum-variables boundary tests.
>
> - Add RPR_DEPTH_NONE (255) as sentinel for top-level elements that
> have no enclosing group. Reduce RPR_DEPTH_MAX to 254 accordingly.
>
> - Add BEGIN handling to computeAbsorbabilityRecursive() so that
> absorbability flags propagate correctly through group boundaries.
>
> - Extract show_rpr_nfa_stats() from show_windowagg_info() for NFA
> statistics display.
>
> - Replace defensive NULL check in collectPatternVariables() with
> Assert, since the caller guarantees a non-NULL pattern.
>
> - Pair each EXPLAIN test query in rpr_explain.sql with a
> pg_get_viewdef() check via CREATE TEMP VIEW, verifying that both
> the ruleutils (parse tree) and explain (bytecode) deparse paths
> produce consistent PATTERN output.
>
> - Add comment to rpr_explain_filter() explaining why only Storage
> memory values are normalized: NFA statistics should not change
> between platforms, and variation would indicate issues such as
> uninitialized memory access.
>
> Also add EXPLAIN test cases for nested ALT patterns, consecutive ALT
> merging, and ALT prefix/suffix merging.
>
> On a side note, I feel that the automata characteristics of RPR --
> state transitions, context management, absorbability, etc. -- tend to
> require a much larger volume of test cases than typical RDBMS/SQL
> features. This seems unavoidable given the combinatorial nature of
> NFA execution.
>
> Best regards,
> Henson
>

Attachment Content-Type Size
Fix-RPR-crash-and-refactor.txt text/plain 256.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2026-02-08 16:24:44 Re: Use pg_current_xact_id() instead of deprecated txid_current()
Previous Message Shinya Kato 2026-02-08 12:27:10 Re: Use pg_current_xact_id() instead of deprecated txid_current()