Re: Row pattern recognition

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: assam258(at)gmail(dot)com
Cc: Tatsuo Ishii <ishii(at)postgresql(dot)org>, zsolt(dot)parragi(at)percona(dot)com, sjjang112233(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, li(dot)evan(dot)chao(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2026-06-02 05:46:22
Message-ID: CACJufxHVADC8e77pnQxSZRk7SYHCZFk6ZCM2HfTsKyD_kUji0A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 1, 2026 at 1:35 PM Henson Choi <assam258(at)gmail(dot)com> wrote:
>
> Hi Tatsuo, Jian,
>
> While tidying RPR comments I found a small inconsistency in the varId bounds.
> The comment/README side I'm already fixing in the in-progress series; whether
> to also change the bounds is a separate follow-up. As lead author that one is
> ultimately your call, Tatsuo, but I'd welcome Jian's and the list's input on
> it first.
>
> The current state, in src/include/optimizer/rpr.h:
>
> #define RPR_VARID_MAX 251
> #define RPR_VARID_BEGIN 252 /* control codes 252..255 */
> ... END 253, ALT 254, FIN 255
>
> RPRElemIsVar(e) == ((e)->varId <= RPR_VARID_MAX) /* 0..251 */
>
> and the limit enforced in parse_rpr.c:
>
> if (list_length(*varNames) >= RPR_VARID_MAX) /* reject the 252nd */
> ereport(ERROR, "too many pattern variables", "Maximum is 251");
>
> So 251 variables are accepted as varId 0..250, leaving 251 a hole: never
> assigned, yet the macro still classifies it as a variable -- one wider than
> the comment's own "0 to RPR_VARID_MAX - 1".
>
> RPRVarId is a uint8, kept small on purpose: varId is the likely per-row
> match-history key, and since a match can run arbitrarily long the history
> grows with it -- so one byte per row, not two, is what keeps that footprint
> in check.
>
> The catch of staying in uint8: the four control codes already fill 252..255,
> so 251 is the only free slot for any future sentinel (anchor ^/$, exclusion
> {- -}) short of widening to uint16. So the hole is really the last reserve.
>
> Three ways, by what the gap is spent on:
>
> (1) Leave it -- just the doc alignment already underway: 251 stays a documented
> reserve, macro unchanged. No follow-up commit. The one free slot is then
> on hand for a single future control code, should one ever be needed.
>
> (2) Fill it as a 252nd variable (0..251). Compatible and doable anytime; a few
> lines in parse_rpr.c / rpr.h plus the boundary test. But it spends the
> last free slot, so a future control code would then force either a
> compatibility-breaking narrow of RPR_VARID_MAX or a widen to two bytes
> (doubling history). Maximal variables now, the control question deferred.
>
> (3) Reserve 16 control codes now (4 used + 12 spare) at the 0xF0 boundary:
> vars 0..239, control 240..255, existing sentinels unmoved, macro becomes
> (varId & 0xF0) != 0xF0. Buys 12-code headroom inside the byte, so history
> stays 1 byte and (2)'s fork never arises. Same edit shape as (2); costs
> only the nominal drop to 240 variables -- but it is a narrowing, so free
> only pre-release.
>
> Which would you prefer?
>

3.

240 variables is enough, as each variable supports multiple complex AND/OR
conditions. Additionally, since PostgreSQL regular expressions use 14 special
characters, reserving the remaining ones in advance is a future-proof approach.

----------------------------------------------------
src/backend/executor/README.rpr

XII-4. Memory Pool Management
Choice: Custom free list

Rationale:
- NFA states are created and destroyed in large numbers per row
- Avoids palloc/pfree overhead
- State size is variable (counts[] array), but within a single query
maxDepth is fixed, so all states have the same size

It would be better simply to mention that:
RPRNFAState and RPRNFAContext are allocated in a partition-lifespan
memory context; they will be destroyed in release_partition.
--------------------------
in ExecRPRFreeContext:
{
if (ctx->states != NULL)
nfa_state_free_list(winstate, ctx->states);
if (ctx->matchedState != NULL)
nfa_state_free(winstate, ctx->matchedState);
}

If ctx->matchedState points to one of the states already in ctx->states, will
nfa_state_free() be called on the same RPRNFAState twice? Is this double-free
permitted, or do we have a mechanism in place to guard against it?
--------------------------
In ExecRPRProcessRow

if (currentPos == ctxFrameEnd)
{
/* Frame boundary reached: force mismatch */
nfa_match(winstate, ctx, NULL);
continue;
}

If I comment out the CONTINUE, the entire regression still succeeds.

--
jian
https://www.enterprisedb.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2026-06-02 05:57:09 Re: Fix regression in vacuumdb --analyze-in-stages for partitioned tables
Previous Message Kyotaro Horiguchi 2026-06-02 05:29:18 Re: pg_rewind does not rewind diverging timelines