Re: Row pattern recognition

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: assam258(at)gmail(dot)com
Cc: zsolt(dot)parragi(at)percona(dot)com, sjjang112233(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, li(dot)evan(dot)chao(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2026-04-27 08:42:20
Message-ID: 20260427.174220.1939160662649810289.ishii@postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Henson,

> Thank you for pointing this out. I will analyse A1 and reflect
> the findings in the next revision.

Thanks in advance.

>> Comment to 0009:
>>
>> + /* Mark VAR in visited before duplicate check to prevent DFS loops
>> */
>> + winstate->nfaVisitedElems[WORDNUM(state->elemIdx)] |=
>> + ((bitmapword) 1 << BITNUM(state->elemIdx));
>>
>> Why do not use bms here? Maybe winstate->nfaVisitedElems could become
>> quite big?
>>
>
> Thank you for the suggestion. I was not aware of Bitmapset when
> this was written; I will switch nfaVisitedElems to Bitmapset.
>
> On sizing, nfa->nelems is structurally bounded by RPR_ELEMIDX_MAX
> (INT16_MAX), since RPRNFAState.elemIdx is int16 and scanRPRPattern()
> rejects anything past that with "pattern too complex". So the
> bitmap is at most 4 KB per WindowAgg even in the worst case.

Hmm. From the comment from bitmapset.c:
* A bitmap set can represent any set of nonnegative integers, although
* it is mainly intended for sets where the maximum value is not large,
* say at most a few hundred.

Maybe 4 KB is tool large? If so, there's no need to rush to change the
implementation. You can leave it as it is for now.

>> Comment to 0024:
>>
>> + <function>first</function> ( <parameter>value</parameter>
>> <type>anyelement</type> [, <parameter>offset</parameter>
>> <type>bigint</type> ] )
>> + <returnvalue>anyelement</returnvalue>
>> + </para>
>> + <para>
>> + Returns the column value at the row <parameter>offset</parameter>
>> + rows after the match start row;
>> + returns NULL if the target row is beyond the current row.
>> + <parameter>offset</parameter> defaults to 0 if omitted, referring
>> to the
>> + match start row itself.
>> + <parameter>offset</parameter> must be a non-negative integer.
>> + <parameter>offset</parameter> must not be NULL.
>> + Can only be used in a <literal>DEFINE</literal> clause.
>> + </para></entry>
>>
>> "Returns the column value at the row" is not appropriate because
>> first() accepts any expression as the first argument. e.g. first(col1
>> + col2). Instead what about "Returns value evaluated at the row"?
>> Same thing can be said to other row pattern navigation operations.
>> I think they inherits my mistakes in my earlier patch. Sorry for this.
>>
>
> Thank you, and please do not apologise -- I should have caught this
> when I carried the wording across. I will reword the synopses for
> first(), last(), prev(), next() (and any other entry that inherited
> the same phrasing) along the line you suggested.

Thank you.

>> Let's prohibit volatile functions in DEFINE. Although this needs extra
>> checking, it would make proceeding codes and tests simpler. If
>> others, especially Vik and Jacob, think differently, please join the
>> discussion.
>>
>
> Thank you for the decision. Since this is a policy change that may
> still attract further community discussion, I would like to handle
> it as a lower-priority, self-contained follow-up on top of 0031, so
> it can be picked up, held, or revised independently. Tentative
> plan: reject volatile functions in DEFINE during parse-analysis via
> contain_volatile_functions(), fold with the existing offset
> runtime-constant check where the two share traversal, and -- if the
> prohibition lands -- convert the B9 test into an error-case test
> and drop the XXX note.

Your tentative plan looks good to me.

>> > On sequencing, if we do take it up, I would suggest handling it
>> > after the 31-patch set, alongside the README.rpr split as
>> > follow-up work on top of 0031. Whether it ultimately lands
>> > inside v47 or as a separate piece on top does not need to be
>> > decided right now -- there is still room to discuss it as the
>> > review progresses, and I am happy to adjust either way based on
>> > your direction.
>>
>> I too prefer to have it after the 31-patch set.
>>
>
> Thank you, confirmed. Both items (volatile-DEFINE prohibition and
> README.rpr split) will land as follow-up patches on top of 0031,
> after the current series is committed.
>
> One quick question: for the 0007 / 0009 / 0017 / 0024 review fixes
> summarised below, would additional follow-up patches on top of 0031
> be preferable to refreshing the already-reviewed series?

Yes, that's fine. I will apply the delta patch for 0007 / 0009 / 0017
/ 0024 on top of 0031.

> For ease of tracking, here is a one-line summary per patch of the
> changes I plan to fold into the next revision:
>
> 0007: analyse A1 and, if the diagnosis holds, revise it so the
> frame optimization bypass is observable in EXPLAIN
> (adjusting the comment and expected output accordingly).
> 0009: switch WindowAggState.nfaVisitedElems from a bitmapword[]
> to Bitmapset, replacing the manual WORDNUM/BITNUM sites
> with bms_add_member() / bms_is_member() / bms_free().
> 0017: fix the "Checks for" -> "Check for" comment, rewrite the
> called-out passive-voice errmsg() strings (and the similar
> ones nearby) in the active voice, and use Max() at the two
> offset-update sites.
> 0024: reword the navigation synopses (first/last/prev/next) to
> "Returns the value of <value> evaluated at the row ...",
> and replace the non-ASCII ≠ / → in rpr.out with != / ->
> (sweeping the rest of rpr*.out for other non-ASCII).

Confirmed.

> In addition, I plan to follow the current series with two new
> patches on top of 0031 (final numbering deferred until they are
> posted):
>
> new: README.rpr split -- extract the design notes currently
> inlined in execRPR.c into src/backend/executor/README.rpr,
> as previously discussed.
> new: Prohibit volatile functions in DEFINE -- separate,
> self-contained patch as described above; intended as a
> lower-priority follow-up that can be picked up, held, or
> revised independently of the main series depending on how
> the community discussion settles.

Ok.

> Separately, one item is deferred for later discussion rather than
> folded into a patch:
>
> deferred: B7 Recursive CTE XXX in 0007 -- whether this case
> falls under ISO/IEC 9075-2 4.18.5 / 6.17.5 prohibition
> ("Ok, let's discuss on this later on."). Noted, happy
> to pick this up whenever convenient.

Ok.

> Once you let me know your preference on the sequencing question
> above, I will send the corresponding patches as soon as they are
> ready, and post the two follow-ups separately so that they can be
> reviewed and committed on top of the main series.
>
> Thank you again for the thorough review.

Thank you for your effort!
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2026-04-27 08:46:39 Re: DELETE/UPDATE FOR PORTION OF with rule system is not working
Previous Message SATYANARAYANA NARLAPURAM 2026-04-27 08:41:18 [Patch]: Fix excessive ProcArrayLock acquisitions with subscription max_retention_duration=0