| From: | Henson Choi <assam258(at)gmail(dot)com> |
|---|---|
| To: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
| Cc: | zsolt(dot)parragi(at)percona(dot)com, sjjang112233(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Row pattern recognition |
| Date: | 2026-04-04 07:31:13 |
| Message-ID: | CAAAe_zAKAGKpK9iHx3ZSuG59sP93r5dfootqv5tCfaMt=w6wzA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi hackers,
I'd like to propose an extension to the context absorption
optimization for Row Pattern Recognition in the window clause:
fixed-length iteration group absorption. This builds on the
two-flag absorption design [1] and three-phase execution
framework [2], and is a sibling optimization to PREFIX pattern
absorption [3].
The existing framework requires group children to have exactly
{1,1} quantifiers. This relaxes the constraint to accept any
fixed-length quantifier (min == max), allowing patterns like
(A B{2})+ and (A (B{2} C){4})+ to benefit from absorption
with no runtime changes.
PREFIX absorption [3] handles fixed-length elements before
the unbounded repetition; this handles fixed-length elements
inside the unbounded group.
[1] Context absorption design - two-flag approach
https://www.postgresql.org/message-id/CAAAe_zCmjDRf9u19xRRbtzq%2B-7ujqadZk0izz0mkef2j%2BWpBBA%40mail.gmail.com
[2] Three-phase absorption framework (Match -> Absorb -> Advance)
https://www.postgresql.org/message-id/CAAAe_zAGLOF_qporKahnL8ajSu_eXZYsJN2M1cEeF5n5GWSNdQ%40mail.gmail.com
[3] PREFIX pattern absorption
https://www.postgresql.org/message-id/CAAAe_zDq7R8CaDfmh8C%2BH3_639Y5LtJD%2B2Z%3D1txDt%3DvaOr90rQ%40mail.gmail.com
[4] FIRST/LAST navigation design note
https://www.postgresql.org/message-id/CAAAe_zCUrKGBgZdaazdO_v9QWHsS_1DXuP%3DrLeNhO3iwsHwwbg%40mail.gmail.com
Design Note: Fixed-Length Iteration Group Absorption
=====================================================
1. Problem Definition
=====================
1.1 Current Absorption Scope
Context absorption currently requires that children of an unbounded
group have exactly {1,1} quantifiers. This means only simple patterns
like (A B)+ qualify for absorption.
Patterns like (A B{2})+ or (A (B{2} C){4})+ are excluded even though
each iteration consumes a fixed number of rows.
1.2 The Fixed-Length Constraint
The existing {1,1} check is overly restrictive. The actual requirement
for absorption is that each iteration of the group consumes a fixed
number of rows, so that the count-dominance property holds: an older
context always has a repeat count >= any newer context at the same
element.
Any group where all children have min == max (recursively) satisfies
this property, regardless of nesting depth or quantifier values.
2. Relationship to the Existing Framework
==========================================
2.1 Position in the Three-Phase Execution Model
The Match -> Absorb -> Advance model [2] is preserved as-is. This
optimization only changes compile-time eligibility analysis in the
two-flag framework [1]; the Absorb phase logic is unchanged.
2.2 Prerequisites
The same four prerequisites as existing absorption apply:
1. Greedy quantifier -- the outer group must be greedy.
2. SKIP TO PAST LAST ROW -- prior matches suppress subsequent
start points.
3. UNBOUNDED FOLLOWING -- all contexts see the same future rows.
4. Shared DEFINE evaluation -- all contexts produce the same
DEFINE result for a given row [4].
2.3 Extension of the Dominance Order
The existing dominance condition is unchanged:
C_old and C_new are in the absorbable region of the same
element, and C_old.counts >= C_new.counts.
The only change is which patterns are recognized as having an
absorbable region. The runtime count comparison works identically
regardless of the step size per iteration.
3. Correctness Argument
========================
A fixed-length quantifier (min == max) is semantically equivalent
to repeating the element that many times. Therefore any pattern
with fixed-length children can be conceptually unrolled to {1,1}
elements:
(A B{2})+ -> (A B B)+
(A (B{2} C){4})+ -> (A B B C B B C B B C B B C)+
((A{2} B{3}){2})+ -> (A A B B B A A B B B)+
The unrolled form contains only {1,1} VARs inside the unbounded
group, which is exactly the existing Case 2 that is already proven
correct for absorption.
This optimization recognizes fixed-length groups at compile time
without actually unrolling them. The runtime behavior is identical
to the unrolled form: each iteration consumes a fixed number of
rows, count-dominance holds, and absorption proceeds unchanged.
4. Compile-Time Analysis
=========================
4.1 Change to isUnboundedStart()
Currently, Case 2 in isUnboundedStart() requires all children of
an unbounded group to be simple {1,1} VARs. The relaxation:
- VAR children: accept min == max (any value)
- BEGIN children (nested subgroups): recursively verify that all
descendants have min == max, including the subgroup's own END
quantifier
- ALT children: reject (unchanged)
The check follows the existing depth-based traversal of the flat
element array.
4.2 Flat Array Example
Pattern: (A (B{2} C){4})+
[0] BEGIN depth=0 min=1 max=INF <- outer group
[1] A depth=1 min=1 max=1 <- 1==1 OK
[2] BEGIN depth=1 min=4 max=4 <- subgroup
[3] B depth=2 min=2 max=2 <- 2==2 OK
[4] C depth=2 min=1 max=1 <- 1==1 OK
[5] END depth=1 min=4 max=4 <- 4==4 OK
[6] END depth=0 min=1 max=INF <- unbounded, greedy
Verification traverses [1]-[5] confirming min==max at every
level. [6] is the outer END: unbounded and greedy, so absorption
applies.
4.3 Runtime: No Changes
The runtime absorption logic (nfa_try_absorb_context,
nfa_states_covered) and flag assignment (RPR_ELEM_ABSORBABLE,
RPR_ELEM_ABSORBABLE_BRANCH) require no changes.
5. Worked Example
==================
Pattern: (A B{2})+
Step size = 1 + 2 = 3.
Data: rows match in repeating A, B, B cycle.
Contexts that fail A at B-rows die immediately (omitted).
Row Data C_old C_new Active
r0 A A - 1
r1 B B(cnt=1) - 1
r2 B B(cnt=2)->END - 1
count=1, loops
r3 A A A 2
r4 B B(cnt=1) B(cnt=1) 2
r5 B B(cnt=2)->END B(cnt=2)->END 2
count=2 count=1
absorb: 2 >= 1 -> C_new absorbed
C_old loops 1
r6 A A A 2
r7 B B(cnt=1) B(cnt=1) 2
r8 B B(cnt=2)->END B(cnt=2)->END 2
count=3 count=1
absorb: 3 >= 1 -> C_new absorbed
C_old loops 1
Absorption occurs at END (the ABSORBABLE judgment point),
where both contexts synchronize after completing one full
iteration. The older context has a strictly higher group
count and absorbs the newer one.
Between END points, both contexts progress in lockstep
at the same pattern positions. Maximum concurrent contexts
during this phase = 2. Overall complexity remains O(n).
6. Non-Qualifying Patterns
===========================
Patterns where any child has min != max do NOT qualify:
(A B+ C)+ -- B+ has min=1, max=INF -> NOT fixed
(A B{2,5})+ -- B has min=2, max=5 -> NOT fixed
(A B?)+ -- B? has min=0, max=1 -> NOT fixed
These patterns have variable step sizes per iteration, so
count-dominance is not guaranteed. They remain non-absorbable
under this optimization.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Lukas Fittl | 2026-04-04 08:11:25 | Re: pg_plan_advice |
| Previous Message | Hayato Kuroda (Fujitsu) | 2026-04-04 06:20:35 | RE: Adding REPACK [concurrently] |