| From: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
|---|---|
| To: | assam258(at)gmail(dot)com |
| Cc: | jian(dot)universality(at)gmail(dot)com, zsolt(dot)parragi(at)percona(dot)com, sjjang112233(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, li(dot)evan(dot)chao(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Row pattern recognition |
| Date: | 2026-06-02 02:50:39 |
| Message-ID: | 20260602.115039.1897923276330429432.ishii@postgresql.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Henson, Jian,
> Hi Tatsuo, Jian,
>
> While tidying RPR comments I found a small inconsistency in the varId
> bounds.
> The comment/README side I'm already fixing in the in-progress series;
> whether
> to also change the bounds is a separate follow-up. As lead author that one
> is
> ultimately your call, Tatsuo, but I'd welcome Jian's and the list's input on
> it first.
>
> The current state, in src/include/optimizer/rpr.h:
>
> #define RPR_VARID_MAX 251
> #define RPR_VARID_BEGIN 252 /* control codes 252..255 */
> ... END 253, ALT 254, FIN 255
>
> RPRElemIsVar(e) == ((e)->varId <= RPR_VARID_MAX) /* 0..251 */
>
> and the limit enforced in parse_rpr.c:
>
> if (list_length(*varNames) >= RPR_VARID_MAX) /* reject the 252nd */
> ereport(ERROR, "too many pattern variables", "Maximum is 251");
>
> So 251 variables are accepted as varId 0..250, leaving 251 a hole: never
> assigned, yet the macro still classifies it as a variable -- one wider than
> the comment's own "0 to RPR_VARID_MAX - 1".
>
> RPRVarId is a uint8, kept small on purpose: varId is the likely per-row
> match-history key, and since a match can run arbitrarily long the history
> grows with it -- so one byte per row, not two, is what keeps that footprint
> in check.
>
> The catch of staying in uint8: the four control codes already fill 252..255,
> so 251 is the only free slot for any future sentinel (anchor ^/$, exclusion
> {- -}) short of widening to uint16. So the hole is really the last reserve.
>
> Three ways, by what the gap is spent on:
>
> (1) Leave it -- just the doc alignment already underway: 251 stays a
> documented
> reserve, macro unchanged. No follow-up commit. The one free slot is
> then
> on hand for a single future control code, should one ever be needed.
>
> (2) Fill it as a 252nd variable (0..251). Compatible and doable anytime; a
> few
> lines in parse_rpr.c / rpr.h plus the boundary test. But it spends the
> last free slot, so a future control code would then force either a
> compatibility-breaking narrow of RPR_VARID_MAX or a widen to two bytes
> (doubling history). Maximal variables now, the control question
> deferred.
>
> (3) Reserve 16 control codes now (4 used + 12 spare) at the 0xF0 boundary:
> vars 0..239, control 240..255, existing sentinels unmoved, macro becomes
> (varId & 0xF0) != 0xF0. Buys 12-code headroom inside the byte, so
> history
> stays 1 byte and (2)'s fork never arises. Same edit shape as (2); costs
> only the nominal drop to 240 variables -- but it is a narrowing, so free
> only pre-release.
>
> The asymmetry: (3) is the only one with a deadline -- a narrowing is
> compatible
> only before release, while (1)/(2) stay open forever. So the question is
> whether to spend this one free moment to lock in 1-byte control headroom
> (3),
> or stay minimal now (1)/(2) and take the narrow-or-widen later if it is ever
> needed. My own lean is toward (3): 240 variables is already far more than
> any
> real pattern will use, so the capacity we give up is nominal, while the
> 12-code
> buffer closes the narrow-or-widen fork for good and keeps match history at
> one
> byte -- and it is the one choice that is free only now. That said, I'd like
> the decision to rest on everyone's input -- Jian's and the list's as much as
> mine -- with you, Tatsuo, weighing it all and making the final call.
>
> Either way, once the feature matures and the final control-code count is
> known,
> the space can be repacked gap-free -- so none of these is the last word.
>
> Which would you prefer?
I'd prefer (3). Yes, I agree that 240 pattern variables is enough.
Regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Kapila | 2026-06-02 02:56:29 | Re: Fix column privileges for pg_subscription.subwalrcvtimeout |
| Previous Message | Fujii Masao | 2026-06-02 02:46:20 | Re: Fix column privileges for pg_subscription.subwalrcvtimeout |