Re: Row pattern recognition

From: Henson Choi <assam258(at)gmail(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Tatsuo Ishii <ishii(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org, zsolt(dot)parragi(at)percona(dot)com, sjjang112233(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, li(dot)evan(dot)chao(at)gmail(dot)com
Subject: Re: Row pattern recognition
Date: 2026-06-24 06:04:09
Message-ID: CAAAe_zC71Lv4UFyMBhoK0gAAUfJ97-7fbj0cVShQDYryWCYqqg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi jian,

Thanks for the refactor. I went through it change by change -- the
per-file notes are at the bottom.

A note up front, so the notes don't read as dismissive: at this stage I'd
rather prioritize validated code over better code. The branch has a lot of
coverage and Valgrind behind it, and every change -- even a clean,
equivalent one -- moves a line out from under that. So where I say "drop",
it means "not now", not "not good"; most of these would be welcome as a
follow-up. One piece, though, does need a fix before any of it lands.

That piece is the new guard in computeAbsorbability(): it carries a
behavior change that no test covers.

> Move isAbsorbable initialization from finalizeRPRPattern() into
> makeRPRPattern(), and guard the computeAbsorbabilityRecursive() call in
> computeAbsorbability() so it is only entered when the first element is
> an ALT or has an unbounded quantifier.

First, what "absorbable" means here: the first unbounded repetition of a
fixed-length body reachable from the start of the pattern. This rule is
worth writing down -- in README.rpr and in the comment on
computeAbsorbability() (or wherever fits best) -- so it is stated once,
explicitly (patch attached: nocfbot-absorb-definition-doc).

By that definition the leading A+ in (A+ B){2,3} is absorbable, but the
patch drops it.

The offending check is in computeAbsorbability():

if (RPRElemIsAlt(&pattern->elements[0]) ||
pattern->elements[0].max == RPR_QUANTITY_INF)
computeAbsorbabilityRecursive(pattern, 0, &hasAbsorbable);

It looks only at elements[0], so a bounded outer group's BEGIN (finite max)
fails the check and the recursion that would descend to A+ never runs.

I'm attaching a small test for this (nocfbot-absorb-bounded-outer-test): it
passes on the branch and fails with v50-0001 applied, pinning both the a+"
marker and the absorbed-context count.

For the rest, change by change -- "ok" = take as-is, "drop" = leave out:

src/backend/optimizer/plan/rpr.c

The branch is already validated under coverage and Valgrind, so I'd rather
not re-open verified logic for equivalent refactors -- glad to take those
as a follow-up. In file order:

ok palloc_array / palloc0_array in makeRPRPattern()
result->varNames = palloc_array(char *, numVars);
result->elements = palloc0_array(RPRPatternElement, numElements);

ok move the isAbsorbable = false init into makeRPRPattern()

drop rename beginElem -> elem in fillRPRPatternGroup()
only beginElem was renamed; with endElem kept, the begin/end pair is
no longer symmetric (elem / endElem)

drop rename isFixedLengthChildren -> isFixedQuantifier -- keep the old
name
"fixed-length" is the term everywhere else (README.rpr, the nearby
comments, even this function's own body), and it drops the
"children"
that flags a whole-subtree check

drop the jump-1 traversal rewrite in that function -- equivalent
refactor,
defer to a follow-up

drop rename to start_with_unbounded_quantifier -- keep isUnboundedStart
Lone snake_case among camelCase neighbors -- including
isFixedQuantifier,
renamed in this same patch -- and it reads as a predicate though it
sets flags.

drop start_with_*'s new signature + if/else restructure -- equivalent
refactor; keep the original isUnboundedStart body

drop rename branchFirst -> branchElement -- cosmetic churn, no behavior
change

drop the new computeAbsorbability() guard -- the regression above
if (RPRElemIsAlt(&pattern->elements[0]) ||
pattern->elements[0].max == RPR_QUANTITY_INF)

src/include/nodes/plannodes.h

ok drop the redundant /* T_RPRPattern */ on the type field
NodeTag type;

drop reworded "see isUnboundedStart" block comment -- keep the original
(it still matches now that isUnboundedStart stays)

ok drop the inline isAbsorbable comment -- the block comment above
covers it
bool isAbsorbable;

src/backend/executor/README.rpr

ok spell out the skip mode -- a clear doc fix on its own
(1) SKIP PAST LAST ROW (not SKIP TO NEXT ROW)

drop isUnboundedStart -> start_with_unbounded_quantifier -- follows the
rename

Tatsuo, what's your read on all this -- in particular, hold the equivalent
refactors until after commit, or take them now?

Best regards,
Henson

Attachment Content-Type Size
nocfbot-absorb-bounded-outer-test.txt text/plain 3.8 KB
nocfbot-absorb-definition-doc.txt text/plain 1.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2026-06-24 06:04:27 Re: Support EXCEPT for ALL SEQUENCES publications
Previous Message Bharath Rupireddy 2026-06-24 05:58:47 Re: Add MIN/MAX aggregate support for uuid