Re: Row pattern recognition

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: assam258(at)gmail(dot)com
Cc: zsolt(dot)parragi(at)percona(dot)com, sjjang112233(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2026-04-13 22:52:25
Message-ID: 20260414.075225.867535045131251890.ishii@postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Henson,

> Hi Tatsuo,
>
> When I started this work a little over three months ago, I had
> no idea it would grow into what it is today. There is still
> much to refine, but I would not have come this far without
> your guidance and support along the way.

I had no idea too :-) I was struggled with RPR develpment issues,
especially on an NFA engine and navigation operations (PREV/NEXT
etc.). You have solved both of the issues.

> This series completes RPR navigation support, including
> PREV/NEXT, FIRST/LAST, and compound navigation such as
> PREV(FIRST(col)).
>
> Attached are 31 incremental patches on top of v46 (replacing
> the previous 15-patch set). Here is a summary of the changes
> from the previous version:
>
>
> 0001-0007: Unchanged from the previous set.

Here are the review for 0001-0007. I will continue to review rest of
the patches.

0001 was already reviewed (LGTM).
0002-0003 looks good to me.
0004 was already reviewed (LGTM).
0005 thanks for review.
0006

+ * If it's a window function referencing a window clause with RPR (Row
+ * Pattern Recognition), don't remove it. Even when the window

As "RPR" already appears in the prevous comment block above, the "RPR"
appearing in the fist comment block should be "RPR (Row Pattern
Recognition)" and "RPR" here should be without "(Row Pattern
Recognition)".

0007:

diff --git a/src/test/regress/expected/rpr_integration.out b/src/test/regress/expected/rpr_integration.out
new file mode 100644
index 00000000000..a21ac5a8588
--- /dev/null
+++ b/src/test/regress/expected/rpr_integration.out
+-- Non-RPR: frame is optimized (RANGE -> ROWS conversion, etc.)
+-- RPR: frame must stay as specified (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)

For these 2 lines I feel a little bit hard to parse. Probably better
to be written as complete sentences, rather than bullet points. For
example,

+-- Non-RPR case: frame can be optimized (RANGE -> ROWS conversion, etc.)
+-- RPR case: frame must stay as specified (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)

Better to have a blanck line here for readability.

+-- Non-RPR window with default frame -> frame optimization applied

+-- ============================================================
+-- B1. RPR + CTE
+-- ============================================================
+-- CTE with RPR, outer query aggregates

Better to comment what we want to test (or expect).

+-- Multiple CTE references

Better to comment what we want to test (or expect).

+-- ============================================================
+-- B2. RPR + JOIN
+-- ============================================================

Better to comment what we want to test (or expect).

+-- ============================================================
+-- B3. RPR + Set operations
+-- ============================================================

Better to comment what we want to test (or expect).

+-- ============================================================
+-- B8. RPR + Incremental sort
+-- ============================================================
+-- Incremental sort may be used when data is partially sorted.
+CREATE INDEX rpr_integ_id_idx ON rpr_integ (id);
+SET enable_seqscan = off;
+EXPLAIN (COSTS OFF)
+SELECT id, val, count(*) OVER w AS cnt
+FROM rpr_integ
+WINDOW w AS (ORDER BY id
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ PATTERN (A B+)
+ DEFINE B AS val > PREV(val));
+ QUERY PLAN
+-------------------------------------------------------------------------------
+ WindowAgg
+ Window: w AS (ORDER BY id ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
+ Pattern: a b+
+ -> Index Scan using rpr_integ_id_idx on rpr_integ
+(4 rows)

No incremental sort in this plan?

+-- ============================================================
+-- B9. RPR + Volatile function in DEFINE
+-- ============================================================
+-- Volatile functions must be evaluated per-row, not optimized away.

Can you elaborate how the test below confirm random() is not optimized
away?

+SELECT id, val, count(*) OVER w AS cnt
+FROM rpr_integ
+WINDOW w AS (ORDER BY id
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ PATTERN (A B+)
+ DEFINE B AS val > PREV(val) AND random() >= 0.0)
+ORDER BY id;
+ id | val | cnt
+----+-----+-----
+ 1 | 10 | 2
+ 2 | 20 | 0
+ 3 | 15 | 2
+ 4 | 25 | 0
+ 5 | 5 | 3
+ 6 | 30 | 0
+ 7 | 35 | 0
+ 8 | 20 | 3
+ 9 | 40 | 0
+ 10 | 45 | 0
+(10 rows)

0008 and beyond will be reviewd in separate emails.

> I am planning to submit PREFIX pattern absorption [3]
> as a separate patch after this set is committed. The
> changes are confined to the Absorb phase with no
> external dependencies, so it can be submitted
> independently. The design itself has some complexity
> (shadow path tracking), and I would like to refine it
> further until I am fully confident in the approach.

Ok.

> With this set I have covered the features I could
> foresee. Whether anything else belongs in scope is
> your call. From here, I think the emphasis should
> shift from adding features to raising overall
> quality.

Agreed.

> If there are directions you think should be explored, I am happy to
> work on them. Otherwise, I plan to focus on code review, test
> coverage, and stabilization, and will continue to submit bug fixes
> and test additions as I find them.

One thing I want to discuss is, how to deal with the cases when
volatile functions are used in the DEFINE clause. As far as I know,
the SQL standard does not prohibit to use volatile functions there,
but we may need to prohibit to use it for our implementation reasons.

For reference, I studied how non RPR part in WindowAggs.c handles
volatile functions.

1. When searching for identical window functions, if they are volatile
functions, they are considered different window functions.

2. When evaluating the frame offset expression, it is evaluated only once
each time a scan/rescan occurs.

3. When checking whether moving aggregation is available, the
volatility of the window function is checked. If it is volatile,
moving aggregation is disabled.

> I also very much welcome refactoring or restructuring
> suggestions from a committer's maintainability
> perspective. If anything in the current code will be
> painful to maintain long-term, I would much rather
> address it now than leave it for after commit.

Ok, I will let you know when I find anything.

> For stabilization, I am thinking of the same approach
> as in early January: gcov analysis, Valgrind runs, and
> expanding the test suite.
>
> **WITH THE PLANNER**, the biggest risk is what we do
> not know we are missing -- there are too many possible
> feature interactions to enumerate up front. To surface
> these, I plan to run RPR-only tests under gcov and
> review the uncovered planner paths: any line that RPR
> could plausibly reach but does not becomes a candidate
> risk, and a candidate for a new test. That turns the
> unknown into a concrete review list rather than a guess
> about which interactions to try. Does that seem like a
> good direction?

Sounds like a good direction.

> If anything comes to mind that should be added or
> fixed -- at any time -- please let me know.

Sure, I will.

Regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2026-04-14 00:03:47 Re: EXCEPT TABLE - Case inconsistency for describe \d and \dRp+
Previous Message Michael Paquier 2026-04-13 22:18:33 Re: [PATCH] Fix: Partitioned parent index remains invalid after child indexes are repaired