| From: | Henson Choi <assam258(at)gmail(dot)com> |
|---|---|
| To: | Tatsuo Ishii <ishii(at)postgresql(dot)org>, jian(dot)universality(at)gmail(dot)com |
| Cc: | pgsql-hackers(at)postgresql(dot)org, zsolt(dot)parragi(at)percona(dot)com, sjjang112233(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, li(dot)evan(dot)chao(at)gmail(dot)com |
| Subject: | Re: Row pattern recognition |
| Date: | 2026-06-23 14:27:15 |
| Message-ID: | CAAAe_zChW=nM9J18wdF2wMa2KesFO53fkbVRudkJ_bstdxzfOQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi hackers,
This is v50, folding in the outstanding review. From my side there are no
known issues left, so I think it makes a good starting point for the kind of
thorough public review the feature should get before commit.
On scope: the feature is almost entirely additive. It adds no SQL-visible
objects -- no DDL, catalog entries, or built-in functions -- and no on-disk
or catalog-format change, so it needs no catversion bump. Row pattern
recognition is reached only through the new window-specification grammar and
the window-function machinery it extends; code that does not use it follows
the same paths as before. The footprint is narrow by design, so the risk to
existing behaviour stays low. It is not meant for back-porting, but it is
contained enough that a back-port would be feasible -- a measure of how
little it disturbs anything else.
That said, the feature is not simple inside, and review would pay off most
on
the parts that took the most care: the NFA matcher (alternation, reluctant
quantifiers, nullable / empty-match handling, and the dormant matches in the
tuplestore spool); the executor's tuple-slot navigation (the
EEOP_RPR_NAV_SET
/ _RESTORE steps and the slot caching behind PREV / NEXT / FIRST / LAST);
and
the planner/executor optimizations -- context absorption and the
reduced-frame
navigation that lets the tuplestore be trimmed early. The surrounding
plumbing is, by comparison, mechanical.
The feature is unchanged since v49 and is re-attached as v49-0001..0009 so
the series applies on master; the work since v49 is the increment
v50-0001..0020 on top.
A note on the posting itself first: the v49 increment went out as
nocfbot-*.txt attachments, which cfbot does not apply, and a later
coverage-report attachment (no patches inside) left the entry with nothing
applicable -- so cfbot has been building plain master, and its current green
is just master's, with no RPR patch applied. v50 goes out as a normal
format-patch series so cfbot picks it up and actually exercises RPR.
One ASAN task is likely to go red once it does: the libLLVM 19 + ASAN JIT
crash (CF 6870) is still unfixed in master, and RPR exercises the JIT paths,
so the ASAN build will hit it -- a toolchain/master issue, not RPR. Matheus
Alcantara's meson fix (exclude sanitizer flags from JIT bitcode generation)
addresses it but is not merged yet; it is not folded into this series.
First, two cleanups splitting changes unrelated to RPR out of the feature
patch:
v50-0001 Remove blank-line changes unrelated to row pattern recognition
v50-0002 Remove unnecessary includes from the row pattern recognition
patch
The fixes (v50-0003..0005, all behavior-changing):
v50-0003 Recognize row pattern navigation operations by name in DEFINE
v50-0004 Use a dedicated ExprContext for RPR DEFINE clause evaluation
v50-0005 Drive RPR row pattern matching once per row
Review of v48 (Jian He, and Tatsuo Ishii), v50-0006..0013 --
behavior-neutral
except where tagged [behavior change]:
v50-0006 Tidy up row pattern recognition plumbing
v50-0007 Further tidy up row pattern recognition plumbing
v50-0008 Refactor transformDefineClause in row pattern recognition
[behavior change]
v50-0009 Replace a bare block with an else in the RPR DEFINE clause
walker
v50-0010 Rename loop index variables in row pattern deparse helpers
v50-0011 Rename absorption "judgment point" to "comparison point" in
comments
v50-0012 Improve comments, documentation, and naming for row pattern
recognition
v50-0013 Document eval_nav_offset_helper's NULL/negative offset handling
New since v49 -- the agreed Jian He follow-ups and the standing quality work
(v50-0014..0020):
v50-0014 Tidy up the row pattern unbounded-quantifier sentinel
v50-0015 Simplify row pattern compilation by passing the WindowClause
v50-0016 Reword the row pattern variable-limit error
v50-0017 Reformat row pattern regression tests for readability
v50-0018 Add row pattern recognition coverage tests and tidy unreachable
code
v50-0019 Free RPR NFA states with pfree() under USE_VALGRIND
v50-0020 Clarify row pattern recognition comments on "step" and no_equal
For traceability, where each patch came from:
patch proposer proposal
-------- -------------
--------------------------------------------------
v50-0001 Henson drop blank-line churn unrelated to RPR
v50-0002 Henson drop unnecessary includes
v50-0003 Henson nav namespace collision; (f).prev as ordinary
function
v50-0004 Henson dedicated ExprContext for DEFINE evaluation
v50-0005 Henson drive row pattern matching once per row
v50-0006 Jian He tidy RPR plumbing
v50-0007 Henson further plumbing tidy
v50-0008 Jian He refactor transformDefineClause
v50-0009 Jian He bare block -> else in the DEFINE walker
v50-0010 Tatsuo Ishii rename deparse loop/index variables
v50-0011 Jian He "judgment point" -> "comparison point" wording
v50-0012 Jian He comment/doc clarity batch + RPR_COUNT_INF rename
v50-0013 Jian He document eval_nav_offset NULL/negative offset
handling
v50-0014 Jian He abstract the quantifier INF bound / sentinel
v50-0015 Jian He pass the WindowClause into pattern compilation
v50-0016 Jian He fold the maximum into the variable-limit message
v50-0017 Henson reformat the regression tests
v50-0018 Henson coverage tests + tidy unreachable code
v50-0019 Henson free NFA states with pfree() under USE_VALGRIND
v50-0020 Jian He clarify the "step" and no_equal comments
Disposition of the latest review (Jian He -- the on-list v48 follow-ups and
the off-list thread that produced v50-0020):
proposal disposition
------------------------------------
-------------------------------------
RPR_QUANTITY_INF sentinel unify accepted -> v50-0014
pass WindowClause; inline accepted -> v50-0015
collectDefineVariables /
tryUnwrapSingleChild
variable-limit error reword accepted -> v50-0016 (kept errmsg +
errdetail, not a single message)
define "step"; reword the no_equal accepted -> v50-0020 (off-list)
comment
"nullable" -> match_empty rejected -- "nullable" is the
standard
automata term
unify DEFINE qualified column-ref rejected -- keep the dedicated
wording
error messages (the range-variable distinction is
needed for the qualifier work)
validateRPRPatternVarCount -> deferred -- name to be settled with
preprocessRPRPattern the qualifier patch
(A B+)+ "variable-length element" rejected -- the original is
accurate;
-> "quantifier is different" cf. (A{2} B{3})+, which is
absorbable
(off-list)
Quality work (Linux), each sent as its own follow-up on this thread:
- Valgrind (leak / use-after-free): no errors across parse, NFA build,
pattern scan, EXPLAIN and node serialization; exercises the DEFINE
ExprContext fix, whose reproducer is cassert-only.
- gcov coverage: modified-line 98.4% (functions 100%); the remaining
uncovered lines are idiomatic defensive guards (enum-default branches,
pg_unreachable(), the public windowapi.h relpos guards) kept on purpose.
Both figures were measured at v50-0019; v50-0020 is comment-only, so they
carry over unchanged. Each report goes out as a separate message so the
details and attachments stay out of this patch mail.
Longer term, and out of scope for this CF entry:
- Smaller documentation and error-message follow-ups (glossary, the
bounded-quantifier message, README wording).
- Short-circuit / tri-state DEFINE evaluation, as a separate series.
- SEEK clause support (SQL:2016).
- Empty pattern PATTERN () -- correctly rejected today; deferred because
its empty-match semantics (SHOW/OMIT EMPTY MATCHES) are tied to the
still-out-of-scope MEASURES clause.
- Relaxing the DEFINE-subquery over-rejection where the standard permits
it.
- Prefix-pattern absorption (an optimization; split out as its own
series).
- R010 (MATCH_RECOGNIZE in the FROM clause) and the shared RPRContext it
would back.
Please let me know if any of the slicing or grouping looks off.
Thanks again to Jian and Tatsuo for the careful review.
Best regards,
Henson
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Henson Choi | 2026-06-23 14:28:18 | Re: Row pattern recognition |
| Previous Message | Tom Lane | 2026-06-23 14:12:14 | Re: for loop variable fixes |