Re: Row pattern recognition

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: assam258(at)gmail(dot)com
Cc: Tatsuo Ishii <ishii(at)postgresql(dot)org>, zsolt(dot)parragi(at)percona(dot)com, sjjang112233(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, li(dot)evan(dot)chao(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2026-05-28 13:23:59
Message-ID: CACJufxGX17thWuEOq1tM5xbRHz2HXm1asooZC3GV25MYGmYqLQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi.

In src/include/nodes/execnodes.h, we're adding quite a few fields to
WindowAggState that are only used for RPR queries.
Should we consolidate these fields behind a single pointer (named
RPRContext) to keep the WindowAggState size smaller for non-RPR
queries?

In function get_reduced_frame_status, I made the following changes:
``````
bool conditionA;
bool conditionB;
bool conditionC;
bool conditionD;
if (!winstate->rpr_match_valid)
return RF_NOT_DETERMINED;
conditionA = (pos == start && winstate->rpr_match_matched && length == 0);
conditionB = (pos < start || pos >= start + length);
conditionC = !winstate->rpr_match_matched;
conditionD = (pos == start);
if ((conditionA && conditionB) || (conditionA && conditionC) ||
(conditionA && conditionD))
{
if (conditionA)
elog(INFO, "conditionA is true");
if (conditionB)
elog(INFO, "conditionB is true");
if (conditionC)
elog(INFO, "conditionC is true");
if (conditionD)
elog(INFO, "conditionD is true");
}
if ((conditionB && conditionC) || (conditionB && conditionD) ||
(conditionC && conditionD))
{
elog(INFO, "more than 2 branch is true, should not be reached");
if (!(conditionC && conditionD))
elog(INFO, "not (conditionCD is true)");
if ((conditionB && conditionD))
elog(INFO, "conditionB and conditionD is true)");
}
``````
I re-ran the regression tests and noticed it's entirely possible for multiple
conditions to be true at once. Because of this, all the IF blocks in
get_reduced_frame_status (after the initial one) are order-dependent. If I move
one IF statement above another, get_reduced_frame_status may return a
different result.
Is this the expected behavior? It seems like a potential logic flaw.

------------------------------------------------------------------------------------
VIII-3. Absorption Conditions

Planner-time prerequisites (all must hold for absorption to be enabled):

(a) SKIP PAST LAST ROW. SKIP TO NEXT ROW creates overlapping
contexts that cannot be safely absorbed.
(b) Unbounded frame (ROWS BETWEEN CURRENT ROW AND UNBOUNDED
FOLLOWING). Limited frames apply differently to each context,
breaking the monotonicity principle.
(c) No match_start-dependent navigation in DEFINE.

Mechanism: each context has a different matchStartRow, so FIRST
resolves to a different row for each context at the same
currentpos. An earlier context's DEFINE result no longer
subsumes a later one's, making count-dominance comparison
invalid. Rather than comparing matchStartRow at runtime
(which would complicate the absorb path), any match_start
dependency disables absorption entirely.

Navigation content match_start dep. absorption
------------------------------------------------------------
No navigation none safe
PREV/NEXT only none safe
LAST (no offset) none safe
LAST (with offset) boundary check unsafe
FIRST (any) direct unsafe
Compound (inner FIRST) direct unsafe
Compound (inner LAST, no off.) none safe
Compound (inner LAST, w/off.) boundary chk unsafe
------------------------------------------------------------------------------------
"boundary chk" should be "boundary check".
column "match_start dep.", I don't understand the meaning of "direct"
and "boundary check".
"count-dominance" seems like an invented word, but I could not find
the explanation.
"match_start-dependent" occurred many times, it would also deserve an
explanation.
We can also rename it as match_start_dependent.

----------------------------------------------------
Attached is a minor refactoring of ExecRPRProcessRow, no need for
boolean hasLimitedFrame.
``if (hasLimitedFrame && winstate->endOffsetValue != 0)`` seems wrong
to me, endOffsetValue is a Datum,
and you directly compare it with 0.
see also calculate_frame_offsets.
----------------------------------------------------
ExecRPRProcessRow,
nfa_evaluate_row
```
if (!window_gettupleslot(winobj, pos, slot))
return false; /* No row exists */
```
indicates that it's not possible for (currentPos > ctxFrameEnd).
therefore
`````
if (currentPos >= ctxFrameEnd)
{
/* Frame boundary exceeded: force mismatch */
nfa_match(winstate, ctx, NULL);
continue;
}
`````
should change to ```if (currentPos == ctxFrameEnd)```.
With that, the comment wording "exceeded" seems wrong too.

The comment 'Frame boundary exceeded' also needs updating since
it's no longer exceeding the boundary.
----------------------------------------------------
Since I couldn't generate a coverage report, I am using elog(INFO) to test
reachability. Rerun the regress tests, you will find out that the ELSE IF branch
below is not reached by current regress tests.
``````
else if (targetCtx->states == NULL)
{
/* Context already completed - skip to result registration */
goto register_result;
elog(INFO, "XXXX reached");
}
``````

--
jian
https://www.enterprisedb.com/

Attachment Content-Type Size
v47-0001-misc-refactor-for-v47-ExecRPRProcessRow.no-cfbot application/octet-stream 4.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ilmar Y 2026-05-28 13:41:58 Re: [PATCH] Add pg_get_policy_ddl() function to reconstruct CREATE POLICY statement
Previous Message Thom Brown 2026-05-28 13:13:46 Re: generic plans and "initial" pruning