Re: Row pattern recognition

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: assam258(at)gmail(dot)com
Cc: zsolt(dot)parragi(at)percona(dot)com, sjjang112233(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, li(dot)evan(dot)chao(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2026-04-27 04:52:43
Message-ID: 20260427.135243.1750053587183825244.ishii@postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

[Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> is added to Cc: I accidentally
forgot to add him. Sorry, Chao.]

Hi Henson,

I looked through 0001-0031 patches.

> Hi Tatsuo,
>
> Thank you for the careful review and for confirming the build on
> your side.
>
> Ok, I will not rebase current v46 and continue to apply your
>> incremental patches and review them, until we agree to ship v47. I
>> may see conflicts while creating v47 patch sets. I will ask help if if
>> your assistance needed. Thanks in advance.
>>
>
> Understood. Please let me know whenever a conflict comes up while
> assembling v47 and I will help resolve it on my side.
>
>
>> I confirmed 0001-0008 applied cleanly and see no compile
>> warning. Regression test passed, no crash.
>>
>
> Thank you for confirming.
>
>
>> > Regarding the README.rpr suggestion from the 0008 review: the
>> > documentation in execRPR.c has dependencies spread across the patch
>> > series, so separating it mid-review would be disruptive. I plan to
>> > split it out as part of the final patch list once all 31 patches have
>> > been reviewed.
>>
>> Ok.
>
>
> Thank you. I will prepare the README.rpr split as a follow-up patch
> after the current 0031 once the series review is complete.
>
> Looking forward to seeing revised incremental patch sets.
>
>
> Attached is the revised v47 series, with the review comments on
> 0006, 0007, and 0008 incorporated. Numbering and subjects are
> unchanged from the previous send.

0001 was already reviewed (LGTM).
0002-0003 looks good to me.
0004 was already reviewed (LGTM).
0005 thanks for review.

> Since the edits in 0006-0008 propagate into the later patches --
> mostly as context-line shifts, and in 0017/0023 as updates to the
> rpr_integration expected output -- I have attached the full
> 0001-0031 set rather than only the updated patches. Applying this
> set on top of v46 should give the same final tree as before, just
> with the review adjustments folded in.

Thank you for taking care of my request.

> Summary of changes from the previous v47:
>
> - 0006 (Fix DEFINE expression handling): per your comment, the
> first occurrence of "RPR" now carries the "(Row Pattern
> Recognition)" expansion and the later occurrence is the bare
> abbreviation.

Confirmed. 0006 Looks good to me.

> - 0007 (Add RPR planner integration tests): overall
> strengthening of the tests and their comments. Each block
> now carries an intent comment describing what the block is
> trying to cover and what the expected plan or result is,
> the "Non-RPR / RPR" comment pair is reworded as complete
> sentences, and B8 is reworked so that the EXPLAIN plan
> actually exercises Incremental Sort.
>
> Because some blocks cover features that are only
> introduced later in the series, a few tests in 0007
> temporarily produce error output. Concretely, B4
> (RPR + prepared statements) uses the 2-argument form
> PREV(val, $1), which is not yet available at the 0007
> point: it is introduced in 0017 (1-slot PREV/NEXT
> navigation), and the accompanying "Nav Mark Lookback"
> EXPLAIN property referenced in the B4 comments is added
> in 0023. The rpr_integration expected-output file is
> updated by 0017 (to clear the PREV/prepared-statement
> errors) and then again by 0023 (to add the
> "Nav Mark Lookback" lines to the EXPLAIN output), so that
> by the end of the series the output is clean.
>
> XXX notes have been added in three places where a test
> records behaviour that may be revisited later:
>
> (1) The subquery-output test documents why the column
> guard in allpaths.c has to be conservative: a column
> referenced only by DEFINE is indistinguishable from
> an exposed column at the targetlist level, so
> remove_unused_subquery_outputs() cannot apply its
> NULL-replacement selectively without risking DEFINE
> evaluation on NULL inputs. The XXX flags this as a
> candidate for a precise follow-up optimization.
>
> (2) The B7 Recursive CTE test carries an XXX noting that
> it is unclear whether this case falls under the
> ISO/IEC 9075-2 4.18.5 / 6.17.5 prohibition, and even
> if it does not, whether a query that does trigger
> the prohibition can be constructed at all. The
> disposition is left to the community.

Ok, let's discuss on this later on.

> (3) The B9 volatile-in-DEFINE test records current
> behaviour using "random() >= 0.0" (structurally
> equivalent to TRUE, so the expected output stays
> deterministic). The XXX flags that if we decide to
> reject volatile functions in DEFINE, this test must
> be converted into an error-case test. See the
> dedicated section at the end of this mail.

Ok.

Comment to A1:

+-- A1. Frame optimization bypass
:
:
+-- Non-RPR window with default frame -> frame optimization applied
+EXPLAIN (COSTS OFF)
+SELECT count(*) OVER w FROM rpr_integ
+WINDOW w AS (ORDER BY id);
+ QUERY PLAN
+-----------------------------------
+ WindowAgg
+ Window: w AS (ORDER BY id)
+ -> Sort
+ Sort Key: id
+ -> Seq Scan on rpr_integ
+(5 rows)

The comment says frame optimization applied. But I don't see any frame
option changes (see how the first query changes the frame options).

I thought a frame optimization is something like this.
EXPLAIN(COSTS OFF)
SELECT row_number() OVER W FROM generate_series(1,5) s(i)
WINDOW w AS (
ORDER BY i
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
);
QUERY PLAN
------------------------------------------------------
WindowAgg
Window: w AS (ORDER BY i ROWS UNBOUNDED PRECEDING)
-> Sort
Sort Key: i
-> Function Scan on generate_series s
(5 rows)

EXPLAIN(COSTS OFF)
SELECT row_number() OVER W FROM generate_series(1,5) s(i)
WINDOW w AS (
ORDER BY i
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
PATTERN (A+)
DEFINE A AS i > 2
);
QUERY PLAN
------------------------------------------------------------------------------
WindowAgg
Window: w AS (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
Pattern: a+"
Nav Mark Lookback: 0
-> Sort
Sort Key: i
-> Function Scan on generate_series s
(7 rows)

> - 0008 (Replace reduced frame map with single match result):
> * Comment block X-1 now uses the full field names
> rpr_match_valid / rpr_match_matched / rpr_match_length.
> * get_reduced_frame_status() sentence rephrased in the
> passive voice, as you suggested:
> "Row's status against the current match result can be
> obtained by calling get_reduced_frame_status()."
> * README.rpr split deferred per the above.

Thanks for the new patch. 0008 looks good to me.

> - 0009-0031: no semantic code changes. Most patches only
> carry context-line shifts from the edits above. The two
> exceptions are 0017 and 0023, whose rpr_integration.out
> hunks are updated to match the revised B4 test in 0007
> (clearing the temporary errors and adding the
> "Nav Mark Lookback" lines, respectively).

Comment to 0009:

+ /* Mark VAR in visited before duplicate check to prevent DFS loops */
+ winstate->nfaVisitedElems[WORDNUM(state->elemIdx)] |=
+ ((bitmapword) 1 << BITNUM(state->elemIdx));

Why do not use bms here? Maybe winstate->nfaVisitedElems could become
quite big?

0010-0016 look good to me.

Comment to 0017:
+
+/*
+ * check_rpr_nav_expr
+ * Validate a single RPRNavExpr node by walking its arg and offset_arg
+ * subtrees in a single pass each. Checks for nested PREV/NEXT, missing
+ * column references, and non-constant offset expressions.
+ */

"Checks for" --> "Check for" (no "s").

+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("PREV and NEXT cannot be nested"),
+ parser_errposition(pstate, nav->location)));

errmsg("PREV and NEXT cannot be nested") ->
errmsg("cannot nest PREV and NEXT") (avoid passive voice in error messages)

+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("next() can only be used in a DEFINE clause")));

errmsg("next() can only be used in a DEFINE clause") ->
errmsg("cannot use next() other than in a DEFINE clause")

The same reason above. There are some places where similar changing
necessary.

+ if (offset > maxOffset)
+ maxOffset = offset;

+ if (offset > *maxOffset)
+ *maxOffset = offset;

You can use Max macro here.

0018-0023 look good to me.

Comment to 0024:

+ <function>first</function> ( <parameter>value</parameter> <type>anyelement</type> [, <parameter>offset</parameter> <type>bigint</type> ] )
+ <returnvalue>anyelement</returnvalue>
+ </para>
+ <para>
+ Returns the column value at the row <parameter>offset</parameter>
+ rows after the match start row;
+ returns NULL if the target row is beyond the current row.
+ <parameter>offset</parameter> defaults to 0 if omitted, referring to the
+ match start row itself.
+ <parameter>offset</parameter> must be a non-negative integer.
+ <parameter>offset</parameter> must not be NULL.
+ Can only be used in a <literal>DEFINE</literal> clause.
+ </para></entry>

"Returns the column value at the row" is not appropriate because
first() accepts any expression as the first argument. e.g. first(col1
+ col2). Instead what about "Returns value evaluated at the row"?
Same thing can be said to other row pattern navigation operations.
I think they inherits my mistakes in my earlier patch. Sorry for this.

--- a/src/test/regress/expected/rpr.out
+++ b/src/test/regress/expected/rpr.out

+-- match_start=3(30): A=id3, B=id4, FIRST(val)=30≠10 → no match

Non ASCII letter(≠) should not be used.

0025-0031 look good to me.

> On volatile functions in DEFINE (your 2026-04-13 mail): the SQL
> standard itself is relatively narrow here -- the only explicit
> restriction it places is that a navigation function's offset must
> be a runtime constant. Volatile expressions in the DEFINE
> predicate are not forbidden by the standard.
>
> That said, once we move into the pattern-matching machinery, the
> observable semantics of a volatile DEFINE become quite murky.
> Depending on the situation, the DEFINE predicate may be evaluated
> once per row, or separately per active matching context, or some
> combination of both as the NFA explores alternatives. A user
> writing "random() > 0.5" in DEFINE would have a hard time
> predicting, let alone relying on, the evaluation count -- and I
> have not been able to think of a realistic use case that actually
> benefits from this flexibility.

Agreed. The flexibility does not bring benefits to users.

> Given the gap between "permitted by the standard" and "meaningful
> to the user," my own leaning is toward prohibiting volatile
> functions in DEFINE at transformation time, and -- if we go that
> way -- also examining whether the new check would make the
> existing runtime-constant check on the navigation-function offset
> redundant, or whether the two could be folded into a single check
> pass. That said, this is ultimately a policy call, and I would rather defer
> the final decision to your judgment as a senior member of the
> community: does prohibition seem like the right direction, or
> would you prefer to keep the current behaviour in place?

Let's prohibit volatile functions in DEFINE. Although this needs extra
checking, it would make proceeding codes and tests simpler. If
others, especially Vik and Jacob, think differently, please join the
discussion.

> On sequencing, if we do take it up, I would suggest handling it
> after the 31-patch set, alongside the README.rpr split as
> follow-up work on top of 0031. Whether it ultimately lands
> inside v47 or as a separate piece on top does not need to be
> decided right now -- there is still room to discuss it as the
> review progresses, and I am happy to adjust either way based on
> your direction.

I too prefer to have it after the 31-patch set.

Regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message SATYANARAYANA NARLAPURAM 2026-04-27 04:56:42 Re: Fix a server crash problem from pg_get_database_ddl
Previous Message Amit Langote 2026-04-27 04:51:38 Re: ri_LockPKTuple misleading message