| From: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
|---|---|
| To: | assam258(at)gmail(dot)com |
| Cc: | zsolt(dot)parragi(at)percona(dot)com, sjjang112233(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Row pattern recognition |
| Date: | 2026-04-10 09:35:15 |
| Message-ID: | 20260410.183515.1561517749674643173.ishii@postgresql.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Henson,
> Hi Tatsuo,
>
> Thank you for the careful review.
>
> Before this, there's a check in the function:
>>
>> if (tle->ressortgroupref || tle->resjunk)
>> continue;
>>
>> Below is the query in the regression test that is suppoed to trigger
>> the error. In this case column "i" in the target list should have
>> resjunk flag to be set to true. So I thought the if statenment above
>> becomes true and the column i is not removed from subquery's target
>> list. Am I missing something?
>
>
> That is a very good observation. For the test query you cited, the
> resjunk check does indeed protect column "i", as you noted.
>
> The allpaths.c guard is needed for a different scenario: when a Var
> already exists in the inner SELECT as a non-resjunk entry, but the
> outer query does not reference it. For example:
>
> SELECT count(*) FROM (
> SELECT i, count(*) OVER w FROM generate_series(1,10) i
> -- ^ in SELECT -> resjunk=false
> WINDOW w AS (
> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> PATTERN (A+)
> DEFINE A AS i > PREV(i)
> )
> ) t;
> -- outer uses only count(*) -> i is not in attrs_used
>
> In this case, the Var addition logic in parse_rpr.c (pull_var_clause
> + dup check) finds that "i" already exists in the targetlist, so it
> does not add a new resjunk entry. Since "i" is resjunk=false and the
> outer query does not reference it, remove_unused_subquery_outputs()
> would replace it with NULL, breaking DEFINE evaluation.
>
> Changing the existing entry to resjunk=true is not an option either,
> since it would alter the subquery's output schema -- the outer query
> may reference that column, and at parse time we do not yet know
> whether it will. The existing protection mechanisms (resjunk,
> ressortgroupref, attrs_used, volatile check) cannot express
> "referenced by DEFINE", since DEFINE is a new column reference path
> that did not exist in PostgreSQL before. The allpaths.c guard
> follows the same pattern as the existing SRF and volatile guards in
> that function.
>
> Also I think removal of the WindowAgg node is fine here because it's
>> not necessary to calculate the result of the sub query at
>> all. Actually the query above runs fine on v46. I guess some of the
>> incremental patches caused this to stop working. Can you elaborate?
>>
>
> I agree -- for this specific query, removing the WindowAgg does not
> change the count(*) result, since window functions do not change the
> number of rows.
>
> Regarding v46: the two bugs were actually latent from the initial
> implementation, not caused by incremental patches. The old code
> added the entire DEFINE expression (e.g., "i > PREV(i)") to the
> query targetlist via findTargetlistEntrySQL99(). In most cases,
> DEFINE conditions involve comparison operators, so the resulting
> OpExpr did not match existing Var entries in the targetlist, and a
> new resjunk entry was created. This accidentally prevented removal.
> (In fact, a bare boolean DEFINE such as "DEFINE A AS flag" would
> have exhibited the same removal bug even in v46, since
> findTargetlistEntrySQL99 would reuse the existing entry instead
> of creating a new resjunk one -- confirming that the issue was
> latent in the original implementation, not introduced by recent
> patches.)
>
> However, that same approach caused the SIGSEGV (Bug 1): when RPR
> and non-RPR windows coexist, the non-RPR WindowAgg inherits
> targetlist entries containing RPRNavExpr nodes that it cannot
> evaluate.
>
> The two bugs are two sides of the same coin -- the old approach
> prevented removal but caused SIGSEGV. Switching to Var-only
> addition fixed the SIGSEGV but required the allpaths.c guard to
> prevent removal.
>
> As for more precise optimization (allowing removal when all
> WindowFuncs for an RPR clause are unused), it would require
> structural changes to the loop in remove_unused_subquery_outputs(),
> which currently evaluates entries individually. Since this function
> runs for all subqueries, not just RPR, such a change would need
> broader testing.
>
> The patch addresses two distinct issues: the parse_rpr.c change
> (Var-only addition) prevents RPRNavExpr from propagating to
> non-RPR WindowAgg nodes, while the allpaths.c guard prevents
> incorrect column removal. Since both are needed, the guard
> cannot simply be reverted.
Thank you for the detailed explanation. That makes sense.
> Would it make sense to keep it as-is
> for now, and explore the more precise optimization as a separate
> follow-up patch?
I agree keep it as is. I think priority of "more precise optimization"
is low for now. We could explore when we have time. What do you think?
Regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Kapila | 2026-04-10 09:52:34 | Re: Fix slotsync worker busy loop causing repeated log messages |
| Previous Message | Erik Rijkers | 2026-04-10 09:18:01 | pg_stash_advice doc |