Re: Row pattern recognition

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: assam258(at)gmail(dot)com
Cc: zsolt(dot)parragi(at)percona(dot)com, sjjang112233(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2026-04-10 09:35:15
Message-ID: 20260410.183515.1561517749674643173.ishii@postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Henson,

> Hi Tatsuo,
>
> Thank you for the careful review.
>
> Before this, there's a check in the function:
>>
>> if (tle->ressortgroupref || tle->resjunk)
>> continue;
>>
>> Below is the query in the regression test that is suppoed to trigger
>> the error. In this case column "i" in the target list should have
>> resjunk flag to be set to true. So I thought the if statenment above
>> becomes true and the column i is not removed from subquery's target
>> list. Am I missing something?
>
>
> That is a very good observation. For the test query you cited, the
> resjunk check does indeed protect column "i", as you noted.
>
> The allpaths.c guard is needed for a different scenario: when a Var
> already exists in the inner SELECT as a non-resjunk entry, but the
> outer query does not reference it. For example:
>
> SELECT count(*) FROM (
> SELECT i, count(*) OVER w FROM generate_series(1,10) i
> -- ^ in SELECT -> resjunk=false
> WINDOW w AS (
> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> PATTERN (A+)
> DEFINE A AS i > PREV(i)
> )
> ) t;
> -- outer uses only count(*) -> i is not in attrs_used
>
> In this case, the Var addition logic in parse_rpr.c (pull_var_clause
> + dup check) finds that "i" already exists in the targetlist, so it
> does not add a new resjunk entry. Since "i" is resjunk=false and the
> outer query does not reference it, remove_unused_subquery_outputs()
> would replace it with NULL, breaking DEFINE evaluation.
>
> Changing the existing entry to resjunk=true is not an option either,
> since it would alter the subquery's output schema -- the outer query
> may reference that column, and at parse time we do not yet know
> whether it will. The existing protection mechanisms (resjunk,
> ressortgroupref, attrs_used, volatile check) cannot express
> "referenced by DEFINE", since DEFINE is a new column reference path
> that did not exist in PostgreSQL before. The allpaths.c guard
> follows the same pattern as the existing SRF and volatile guards in
> that function.
>
> Also I think removal of the WindowAgg node is fine here because it's
>> not necessary to calculate the result of the sub query at
>> all. Actually the query above runs fine on v46. I guess some of the
>> incremental patches caused this to stop working. Can you elaborate?
>>
>
> I agree -- for this specific query, removing the WindowAgg does not
> change the count(*) result, since window functions do not change the
> number of rows.
>
> Regarding v46: the two bugs were actually latent from the initial
> implementation, not caused by incremental patches. The old code
> added the entire DEFINE expression (e.g., "i > PREV(i)") to the
> query targetlist via findTargetlistEntrySQL99(). In most cases,
> DEFINE conditions involve comparison operators, so the resulting
> OpExpr did not match existing Var entries in the targetlist, and a
> new resjunk entry was created. This accidentally prevented removal.
> (In fact, a bare boolean DEFINE such as "DEFINE A AS flag" would
> have exhibited the same removal bug even in v46, since
> findTargetlistEntrySQL99 would reuse the existing entry instead
> of creating a new resjunk one -- confirming that the issue was
> latent in the original implementation, not introduced by recent
> patches.)
>
> However, that same approach caused the SIGSEGV (Bug 1): when RPR
> and non-RPR windows coexist, the non-RPR WindowAgg inherits
> targetlist entries containing RPRNavExpr nodes that it cannot
> evaluate.
>
> The two bugs are two sides of the same coin -- the old approach
> prevented removal but caused SIGSEGV. Switching to Var-only
> addition fixed the SIGSEGV but required the allpaths.c guard to
> prevent removal.
>
> As for more precise optimization (allowing removal when all
> WindowFuncs for an RPR clause are unused), it would require
> structural changes to the loop in remove_unused_subquery_outputs(),
> which currently evaluates entries individually. Since this function
> runs for all subqueries, not just RPR, such a change would need
> broader testing.
>
> The patch addresses two distinct issues: the parse_rpr.c change
> (Var-only addition) prevents RPRNavExpr from propagating to
> non-RPR WindowAgg nodes, while the allpaths.c guard prevents
> incorrect column removal. Since both are needed, the guard
> cannot simply be reverted.

Thank you for the detailed explanation. That makes sense.

> Would it make sense to keep it as-is
> for now, and explore the more precise optimization as a separate
> follow-up patch?

I agree keep it as is. I think priority of "more precise optimization"
is low for now. We could explore when we have time. What do you think?

Regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2026-04-10 09:52:34 Re: Fix slotsync worker busy loop causing repeated log messages
Previous Message Erik Rijkers 2026-04-10 09:18:01 pg_stash_advice doc