Re: Row pattern recognition

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: assam258(at)gmail(dot)com
Cc: Tatsuo Ishii <ishii(at)postgresql(dot)org>, zsolt(dot)parragi(at)percona(dot)com, sjjang112233(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, li(dot)evan(dot)chao(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2026-05-30 06:45:25
Message-ID: CACJufxEsaU8GQ4yeXTWhAO8VjbrZTh5CpvUqz=4a3T0Cwz44pA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi.

-- Consecutive VAR merge: A A+ -> a{2,}
-- Tests line 251: child->max == RPR_QUANTITY_INF branch in mergeConsecutiveVars
-- prev: A{1,1} (finite), child: A+ (infinite) triggers line 251 evaluation
EXPLAIN (COSTS OFF)
SELECT COUNT(*) OVER w FROM rpr_plan
WINDOW w AS (ORDER BY id ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
PATTERN (A A+) DEFINE A AS val > 0);

-- Consecutive VAR merge: A A+ -> a{2,}
-- Tests line 251: child->max == RPR_QUANTITY_INF branch in mergeConsecutiveVars
-- prev: A{1,1} (finite), child: A+ (infinite) triggers line 251 evaluation
EXPLAIN (COSTS OFF)
SELECT COUNT(*) OVER w FROM rpr_plan
WINDOW w AS (ORDER BY id ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
PATTERN (A A+) DEFINE A AS val > 0);

Comments like "Tests line 251" should not appear in SQL, since the
line number would change.
So, we need to rephrase this comment. Please don't delete it, it may
clarify the patch for future readers.

/*
* nfa_add_state_unique
*
* Add a state to ctx->states at the END, only if no duplicate exists.
* Returns true if state was added, false if duplicate found (state is freed).
* Earlier states have better lexical order (DFS traversal order), so
existing wins.
*/
static bool
nfa_add_state_unique(WindowAggState *winstate, RPRNFAContext *ctx,
RPRNFAState *state)

The "at the END" confuses me. It may also refers to RPR_VARID_END /
RPRElemIsEnd.
How about something like:
"Add the state to the end of the ctx->states linked list, but only if
a duplicate state is not already present."

I think a bunch of these uppercase ENDs should just be lowercase "end",
capitalizing them may collides visually with RPR_VARID_END and RPRElemIsEnd and
makes the comments harder to read.

In nfa_match:
"Non-VAR elements (ALT, END, FIN) are kept as-is for advance phase."

Here END does mean RPRElemIsEnd, right? That one's probably fine as uppercase
since it's listed alongside other element kinds.

But then:
/*
* Evaluate VAR elements against current row. For VARs that reach max
* count with END next, advance through END chain inline so absorb phase
* can compare states at judgment points.
*/
I'm think END here means the tail of the RPRNFAContext->RPRNFAState linked
list, not the element kind -- which is exactly the ambiguity I'm worried about.

Could we lowercase the list-end ones and keep uppercase only when actually
referring to the END element kind?

The comment under nfa_advance_var is confusing (for me).
/* After a successful match, count >= 1, so at least one must be true */

nfa_advance_var doesn't actually know anything about match status (i
think), it can be reached with count == 0.
I added this elog(INFO) to confirm:
if (count < 1 && currentPos > -1)
elog(INFO, "should not reach, count is %d, currentPos=%ld",
count, currentPos);

...and it does fire, so the comment's premise isn't quite right.

------------------------------------------------
please check the attached minor refactoring.
1. Refactor struct WindowAggState: Remove the nfaStateSize and nfaVisitedNWords
fields because they are constant, and we can easily compute it, seems
not necessary to stay in WindowAggState.

2. Rename nfa_context_alloc() to nfa_context_make(), nfa_state_alloc()
to nfa_state_make(), nfa_state_create() to nfa_state_clone().
Rationale: We have makeNode, changing "alloc" to "make" would be more
intuitive, IMHO.
In tablecmd.c, we have CloneForeignKeyConstraints, which is based on
existing information, creating a new node, here we are doing the same
in
nfa_state_create.

3. Refactor functions: nfa_advance_alt, nfa_advance_begin,
nfa_advance_end, and nfa_advance_var.
Because the (RPRPatternElement *elem) parameter is unnecessary.

--
jian
https://www.enterprisedb.com/

Attachment Content-Type Size
v47-0002-remove-nfa-related-function-name.no-cfbot application/octet-stream 9.0 KB
v47-0001-remove-nfaStateSize-and-nfaVisitedNWords.no-cfbot application/octet-stream 4.9 KB
v47-0003-nfa-refactor-argument.no-cfbot application/octet-stream 4.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2026-05-30 08:05:25 Re: injection_points: Switch wait/wakeup to use atomics rather than latches
Previous Message Michael Paquier 2026-05-30 04:47:52 Re: Copy-paste error in hash_record_extended() — args[1].isnull not set