Re: Row pattern recognition

From: Henson Choi <assam258(at)gmail(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>, jian(dot)universality(at)gmail(dot)com
Cc: zsolt(dot)parragi(at)percona(dot)com, sjjang112233(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, li(dot)evan(dot)chao(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2026-06-01 00:40:40
Message-ID: CAAAe_zBO7wU6JEuQJ246PzS=B63ufcYUQjovEfwsQvCZdaM7tg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tatsuo, Jian,

Thanks, Tatsuo -- your two notes settle both open questions. Answering both
inline below, with a note at the end on the follow-up work.

> Basically I think Jian's idea is good. In addition to the size reason
> above, we would have less code changes when we adapt existing R020
> codes to R010.
>
> However it will need a wide code change as Henson said. I would like
> to focus on stabilizing our code for now. Therefore I would not want
> the refactoring in v48.

Agreed -- out of v48, stabilize first. On R010, I'd treat it as the design
lens, not the schedule: it's far out, so rather than hold RPRContext back
until then, I'd do the consolidation at a sensible point after v48 but shape
it against R010 (what is shared versus per-context, how the fields group),
so
one engine core can later back both the R020 window path and R010 without a
second reshape. Not v48 -- on its own schedule once we're stable.

I'll admit the R010 connection had been nagging at me for a while without a
clean answer, and Jian's consolidation suggestion turns out to land right on
it: framed as a size win, it's really the structural move that opens the
R010
path -- that reframing is his. Which is why I'd rather shape it
deliberately,
as the shared engine core, than fold it in as churn.

> Although I don't have any particular strong preferences, keeping
> "absorption" for the runtime concept sounds good to me.

Good -- "absorption" stays reserved for the runtime context-equivalence
collapse, and the README explains it in one place.

> For AST level name changing, "prefix/suffix merging" seems to be
> already used in other areas according to Google: LLM, Linker, and
> string manipulation in DNA. In the normal expression engine area, it
> looks like "flattening nested quantifiers" or "quantifiers reduction"
> are used for the case. So, for example, "prefix/suffix quantifiers
> reduction" seems to be more appropriate? (If you don't mind it's too
> long) In any case, I would like to respect your opinion.

Thanks -- you're right that "merging" is well-worn elsewhere, and I'll be
honest that "prefix/suffix merging" isn't a term I'd defend on the merits.
Keeping it for v48 is really a stopgap to contain the ripple: the sibling
Phase-1 rewrites are already named "consecutive variable / group / ALT
merging", so switching to the "flattening / reduction" family would force
renaming those too for consistency. So I'd treat the term itself as
genuinely
open -- your "flattening / reduction" neighborhood is the right one -- and
converge on the established academic naming as the paper I'm preparing on
the
algorithm, together with a university research group, takes shape. For now
I'd
ship "prefix/suffix merging" only to keep the README internally consistent,
and fold the settled term into the glossary pass once the paper lands on
it. A
doc-level name is cheap to revise later.

Both of these -- the RPRContext reshape and the naming/terminology -- are
"after v48, by discussion" items, and they aren't the only ones. Rather than
pin down the scope of that follow-up now, I'd suggest we pick it up together
once v48 is stable.

Thanks again,
Henson

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2026-06-01 00:55:50 Re: Update our timezone code to IANA tzcode2026b
Previous Message Tom Lane 2026-05-31 23:38:02 Update our timezone code to IANA tzcode2026b