| From: | jian he <jian(dot)universality(at)gmail(dot)com> |
|---|---|
| To: | assam258(at)gmail(dot)com |
| Cc: | Tatsuo Ishii <ishii(at)postgresql(dot)org>, zsolt(dot)parragi(at)percona(dot)com, sjjang112233(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, li(dot)evan(dot)chao(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Row pattern recognition |
| Date: | 2026-06-02 05:46:22 |
| Message-ID: | CACJufxHVADC8e77pnQxSZRk7SYHCZFk6ZCM2HfTsKyD_kUji0A@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Jun 1, 2026 at 1:35 PM Henson Choi <assam258(at)gmail(dot)com> wrote:
>
> Hi Tatsuo, Jian,
>
> While tidying RPR comments I found a small inconsistency in the varId bounds.
> The comment/README side I'm already fixing in the in-progress series; whether
> to also change the bounds is a separate follow-up. As lead author that one is
> ultimately your call, Tatsuo, but I'd welcome Jian's and the list's input on
> it first.
>
> The current state, in src/include/optimizer/rpr.h:
>
> #define RPR_VARID_MAX 251
> #define RPR_VARID_BEGIN 252 /* control codes 252..255 */
> ... END 253, ALT 254, FIN 255
>
> RPRElemIsVar(e) == ((e)->varId <= RPR_VARID_MAX) /* 0..251 */
>
> and the limit enforced in parse_rpr.c:
>
> if (list_length(*varNames) >= RPR_VARID_MAX) /* reject the 252nd */
> ereport(ERROR, "too many pattern variables", "Maximum is 251");
>
> So 251 variables are accepted as varId 0..250, leaving 251 a hole: never
> assigned, yet the macro still classifies it as a variable -- one wider than
> the comment's own "0 to RPR_VARID_MAX - 1".
>
> RPRVarId is a uint8, kept small on purpose: varId is the likely per-row
> match-history key, and since a match can run arbitrarily long the history
> grows with it -- so one byte per row, not two, is what keeps that footprint
> in check.
>
> The catch of staying in uint8: the four control codes already fill 252..255,
> so 251 is the only free slot for any future sentinel (anchor ^/$, exclusion
> {- -}) short of widening to uint16. So the hole is really the last reserve.
>
> Three ways, by what the gap is spent on:
>
> (1) Leave it -- just the doc alignment already underway: 251 stays a documented
> reserve, macro unchanged. No follow-up commit. The one free slot is then
> on hand for a single future control code, should one ever be needed.
>
> (2) Fill it as a 252nd variable (0..251). Compatible and doable anytime; a few
> lines in parse_rpr.c / rpr.h plus the boundary test. But it spends the
> last free slot, so a future control code would then force either a
> compatibility-breaking narrow of RPR_VARID_MAX or a widen to two bytes
> (doubling history). Maximal variables now, the control question deferred.
>
> (3) Reserve 16 control codes now (4 used + 12 spare) at the 0xF0 boundary:
> vars 0..239, control 240..255, existing sentinels unmoved, macro becomes
> (varId & 0xF0) != 0xF0. Buys 12-code headroom inside the byte, so history
> stays 1 byte and (2)'s fork never arises. Same edit shape as (2); costs
> only the nominal drop to 240 variables -- but it is a narrowing, so free
> only pre-release.
>
> Which would you prefer?
>
3.
240 variables is enough, as each variable supports multiple complex AND/OR
conditions. Additionally, since PostgreSQL regular expressions use 14 special
characters, reserving the remaining ones in advance is a future-proof approach.
----------------------------------------------------
src/backend/executor/README.rpr
XII-4. Memory Pool Management
Choice: Custom free list
Rationale:
- NFA states are created and destroyed in large numbers per row
- Avoids palloc/pfree overhead
- State size is variable (counts[] array), but within a single query
maxDepth is fixed, so all states have the same size
It would be better simply to mention that:
RPRNFAState and RPRNFAContext are allocated in a partition-lifespan
memory context; they will be destroyed in release_partition.
--------------------------
in ExecRPRFreeContext:
{
if (ctx->states != NULL)
nfa_state_free_list(winstate, ctx->states);
if (ctx->matchedState != NULL)
nfa_state_free(winstate, ctx->matchedState);
}
If ctx->matchedState points to one of the states already in ctx->states, will
nfa_state_free() be called on the same RPRNFAState twice? Is this double-free
permitted, or do we have a mechanism in place to guard against it?
--------------------------
In ExecRPRProcessRow
if (currentPos == ctxFrameEnd)
{
/* Frame boundary reached: force mismatch */
nfa_match(winstate, ctx, NULL);
continue;
}
If I comment out the CONTINUE, the entire regression still succeeds.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2026-06-02 05:57:09 | Re: Fix regression in vacuumdb --analyze-in-stages for partitioned tables |
| Previous Message | Kyotaro Horiguchi | 2026-06-02 05:29:18 | Re: pg_rewind does not rewind diverging timelines |