Re: Row pattern recognition

From: Henson Choi <assam258(at)gmail(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, peter(at)eisentraut(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2026-01-15 13:28:39
Message-ID: CAAAe_zAUSSiEgNSiwQSRd2FT094rC3BoCJawqERDOv-fZ97rKw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Ishii-san

We need to check the *frame" end, not the partition end.
>
> I think your patch relies on !window_gettupleslot() to check whether
> the row exists.
>
> if (!window_gettupleslot(winobj, pos, slot))
> return false; /* No row exists */
>
> But the function only checks the row existence in the current partition:
>
> * Fetch the pos'th tuple of the current partition into the slot,
>
> Thus it is possible that window_gettupleslot() returns true but the
> row is not in the current frame in case that the partition is divided
> into some frames. You need to check the row existence in a frame. For
> this purpose you can use row_is_in_frame().
>
> I ran following query and got the result with v38 (which includes your
> NFA patches).
>
> WITH data AS (
> SELECT * FROM (VALUES
> ('A', 1), ('A', 2),
> ('B', 3), ('B', 4)
> ) AS t(gid, id))
> SELECT gid, id, array_agg(id) OVER w
> FROM data
> WINDOW w AS (
> PARTITION BY gid
> ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
> AFTER MATCH SKIP TO NEXT ROW
> PATTERN (A+)
> DEFINE A AS id < 10
> );
> gid | id | array_agg
> -----+----+-----------
> A | 1 | {1,2}
> A | 2 |
> B | 3 | {3,4}
> B | 4 |
> (4 rows)
>
> I think the second and 4th rows are expected to return some data in
> array_agg colum. In fact v37 patch returns following results for the
> same query:
>
> gid | id | array_agg
> -----+----+-----------
> A | 1 | {1,2}
> A | 2 | {2}
> B | 3 | {3,4}
> B | 4 | {4}
> (4 rows)
>

While investigating the issue you mentioned, I found that the problem
is caused by the context absorption logic, not the frame boundary check.

The absorption logic removes a newer context when
older->matchEndRow >= newer->matchEndRow:

Example with pattern A+:
- Context at row 1: matchEndRow = 2 (match {1,2})
- Context at row 2: matchEndRow = 2 (match {2})
→ Newer context absorbed, row 2's result lost

With SKIP TO NEXT ROW, each row should produce its own independent match.
The absorption was incorrectly removing these valid overlapping matches.

I modified the absorption logic to only apply to SKIP PAST LAST ROW mode,
where it provides performance benefits without affecting correctness.

For SKIP TO NEXT ROW, absorption is now disabled, allowing each row
to produce independent overlapping matches as expected:

gid | id | array_agg
-----+----+-----------
A | 1 | {1,2}
A | 2 | {2}
B | 3 | {3,4}
B | 4 | {4}
(4 rows)

I'm attaching 3 patches:

1. Test cases from Jacob Champion's branch
- Added 355 lines of test cases and 51 lines of executor changes

2. Refactored update_reduced_frame for readability and performance
- Extracted large code blocks into separate functions
- Early return for already-processed positions
- Integrated nfa_unlink_context into nfa_context_free
- Used oldest-first context ordering for early termination

3. Enable context absorption only for SKIP PAST LAST ROW
- Fixes overlapping match issue for SKIP TO NEXT ROW
- In my 5000-row test, absorption showed significant performance gain
for SKIP PAST LAST ROW
- Added your test case to regression tests

Since I implemented it in the order of parser/planner/nfa/executor,
the executor flow became somewhat confusing.
I plan to continue improving it by reviewing and refactoring
from the top-level functions downward.

During refactoring, I plan to simplify pattern optimization and context
absorption logic, keeping only the parts that are clearly correct, even
if this means some performance loss in the initial version. We can add
optimizations back later once the core logic is stable and well-tested.

Best regards,
Henson

Attachment Content-Type Size
0001-Test-cases-from-Jacob.txt text/plain 31.4 KB
0002-Refactor-update_reduced_frame.txt text/plain 19.3 KB
0003-Context-absorption.txt text/plain 7.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2026-01-15 13:29:19 Re: how to gate experimental features (SQL/PGQ)
Previous Message Christoph Berg 2026-01-15 13:15:30 Re: Proposal: SELECT * EXCLUDE (...) command