Re: Row pattern recognition

From: Henson Choi <assam258(at)gmail(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: zsolt(dot)parragi(at)percona(dot)com, sjjang112233(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2026-04-04 07:31:13
Message-ID: CAAAe_zAKAGKpK9iHx3ZSuG59sP93r5dfootqv5tCfaMt=w6wzA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

I'd like to propose an extension to the context absorption
optimization for Row Pattern Recognition in the window clause:
fixed-length iteration group absorption. This builds on the
two-flag absorption design [1] and three-phase execution
framework [2], and is a sibling optimization to PREFIX pattern
absorption [3].

The existing framework requires group children to have exactly
{1,1} quantifiers. This relaxes the constraint to accept any
fixed-length quantifier (min == max), allowing patterns like
(A B{2})+ and (A (B{2} C){4})+ to benefit from absorption
with no runtime changes.

PREFIX absorption [3] handles fixed-length elements before
the unbounded repetition; this handles fixed-length elements
inside the unbounded group.

[1] Context absorption design - two-flag approach
https://www.postgresql.org/message-id/CAAAe_zCmjDRf9u19xRRbtzq%2B-7ujqadZk0izz0mkef2j%2BWpBBA%40mail.gmail.com
[2] Three-phase absorption framework (Match -> Absorb -> Advance)
https://www.postgresql.org/message-id/CAAAe_zAGLOF_qporKahnL8ajSu_eXZYsJN2M1cEeF5n5GWSNdQ%40mail.gmail.com
[3] PREFIX pattern absorption
https://www.postgresql.org/message-id/CAAAe_zDq7R8CaDfmh8C%2BH3_639Y5LtJD%2B2Z%3D1txDt%3DvaOr90rQ%40mail.gmail.com
[4] FIRST/LAST navigation design note
https://www.postgresql.org/message-id/CAAAe_zCUrKGBgZdaazdO_v9QWHsS_1DXuP%3DrLeNhO3iwsHwwbg%40mail.gmail.com

Design Note: Fixed-Length Iteration Group Absorption
=====================================================

1. Problem Definition
=====================

1.1 Current Absorption Scope

Context absorption currently requires that children of an unbounded
group have exactly {1,1} quantifiers. This means only simple patterns
like (A B)+ qualify for absorption.

Patterns like (A B{2})+ or (A (B{2} C){4})+ are excluded even though
each iteration consumes a fixed number of rows.

1.2 The Fixed-Length Constraint

The existing {1,1} check is overly restrictive. The actual requirement
for absorption is that each iteration of the group consumes a fixed
number of rows, so that the count-dominance property holds: an older
context always has a repeat count >= any newer context at the same
element.

Any group where all children have min == max (recursively) satisfies
this property, regardless of nesting depth or quantifier values.

2. Relationship to the Existing Framework
==========================================

2.1 Position in the Three-Phase Execution Model

The Match -> Absorb -> Advance model [2] is preserved as-is. This
optimization only changes compile-time eligibility analysis in the
two-flag framework [1]; the Absorb phase logic is unchanged.

2.2 Prerequisites

The same four prerequisites as existing absorption apply:

1. Greedy quantifier -- the outer group must be greedy.
2. SKIP TO PAST LAST ROW -- prior matches suppress subsequent
start points.
3. UNBOUNDED FOLLOWING -- all contexts see the same future rows.
4. Shared DEFINE evaluation -- all contexts produce the same
DEFINE result for a given row [4].

2.3 Extension of the Dominance Order

The existing dominance condition is unchanged:

C_old and C_new are in the absorbable region of the same
element, and C_old.counts >= C_new.counts.

The only change is which patterns are recognized as having an
absorbable region. The runtime count comparison works identically
regardless of the step size per iteration.

3. Correctness Argument
========================

A fixed-length quantifier (min == max) is semantically equivalent
to repeating the element that many times. Therefore any pattern
with fixed-length children can be conceptually unrolled to {1,1}
elements:

(A B{2})+ -> (A B B)+
(A (B{2} C){4})+ -> (A B B C B B C B B C B B C)+
((A{2} B{3}){2})+ -> (A A B B B A A B B B)+

The unrolled form contains only {1,1} VARs inside the unbounded
group, which is exactly the existing Case 2 that is already proven
correct for absorption.

This optimization recognizes fixed-length groups at compile time
without actually unrolling them. The runtime behavior is identical
to the unrolled form: each iteration consumes a fixed number of
rows, count-dominance holds, and absorption proceeds unchanged.

4. Compile-Time Analysis
=========================

4.1 Change to isUnboundedStart()

Currently, Case 2 in isUnboundedStart() requires all children of
an unbounded group to be simple {1,1} VARs. The relaxation:

- VAR children: accept min == max (any value)
- BEGIN children (nested subgroups): recursively verify that all
descendants have min == max, including the subgroup's own END
quantifier
- ALT children: reject (unchanged)

The check follows the existing depth-based traversal of the flat
element array.

4.2 Flat Array Example

Pattern: (A (B{2} C){4})+

[0] BEGIN depth=0 min=1 max=INF <- outer group
[1] A depth=1 min=1 max=1 <- 1==1 OK
[2] BEGIN depth=1 min=4 max=4 <- subgroup
[3] B depth=2 min=2 max=2 <- 2==2 OK
[4] C depth=2 min=1 max=1 <- 1==1 OK
[5] END depth=1 min=4 max=4 <- 4==4 OK
[6] END depth=0 min=1 max=INF <- unbounded, greedy

Verification traverses [1]-[5] confirming min==max at every
level. [6] is the outer END: unbounded and greedy, so absorption
applies.

4.3 Runtime: No Changes

The runtime absorption logic (nfa_try_absorb_context,
nfa_states_covered) and flag assignment (RPR_ELEM_ABSORBABLE,
RPR_ELEM_ABSORBABLE_BRANCH) require no changes.

5. Worked Example
==================

Pattern: (A B{2})+
Step size = 1 + 2 = 3.

Data: rows match in repeating A, B, B cycle.
Contexts that fail A at B-rows die immediately (omitted).

Row Data C_old C_new Active
r0 A A - 1
r1 B B(cnt=1) - 1
r2 B B(cnt=2)->END - 1
count=1, loops
r3 A A A 2
r4 B B(cnt=1) B(cnt=1) 2
r5 B B(cnt=2)->END B(cnt=2)->END 2
count=2 count=1
absorb: 2 >= 1 -> C_new absorbed
C_old loops 1
r6 A A A 2
r7 B B(cnt=1) B(cnt=1) 2
r8 B B(cnt=2)->END B(cnt=2)->END 2
count=3 count=1
absorb: 3 >= 1 -> C_new absorbed
C_old loops 1

Absorption occurs at END (the ABSORBABLE judgment point),
where both contexts synchronize after completing one full
iteration. The older context has a strictly higher group
count and absorbs the newer one.

Between END points, both contexts progress in lockstep
at the same pattern positions. Maximum concurrent contexts
during this phase = 2. Overall complexity remains O(n).

6. Non-Qualifying Patterns
===========================

Patterns where any child has min != max do NOT qualify:

(A B+ C)+ -- B+ has min=1, max=INF -> NOT fixed
(A B{2,5})+ -- B has min=2, max=5 -> NOT fixed
(A B?)+ -- B? has min=0, max=1 -> NOT fixed

These patterns have variable step sizes per iteration, so
count-dominance is not guaranteed. They remain non-absorbable
under this optimization.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Lukas Fittl 2026-04-04 08:11:25 Re: pg_plan_advice
Previous Message Hayato Kuroda (Fujitsu) 2026-04-04 06:20:35 RE: Adding REPACK [concurrently]