| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | "Kamil Monicz" <kamil(at)monicz(dot)dev> |
| Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(at)eisentraut(dot)org>, Vik Fearing <vik(at)postgresfriends(dot)org> |
| Subject: | Re: BUG #19106: Potential regression with CTE materialization planning in Postgres 18 |
| Date: | 2025-11-11 19:35:26 |
| Message-ID: | 2221261.1762889726@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
I wrote:
> Here is a draft fix for this. What it basically decides is that
> commit b0cc0a71e was in error to suppose that an outer CTE reference
> should work like an outer Var reference. In this even-more-simplified
> test case:
> WITH a AS (
> SELECT id FROM (VALUES (1), (2)) AS v(id)
> ),
> b AS (
> SELECT max((SELECT sum(id) FROM a)) AS agg
> )
> SELECT agg FROM b;
I wanted to post a little more analysis of what's happening here,
mostly for the archives' sake. Since b0cc0a71e,
check_agg_arguments_walker mistakenly decides that the max()
aggregate ought to belong to the query level where the "a" CTE
is, that is the outer "WITH ... SELECT agg FROM b". In an
assert-enabled build, check_agglevels_and_constraints promptly
crashes at
switch (pstate->p_expr_kind)
{
case EXPR_KIND_NONE:
Assert(false); /* can't happen */
break;
because it's looking at the ParseState for that outer query
level, where we are not examining any particular subexpression.
In a non-assert build, we more-or-less-accidentally get through
check_agglevels_and_constraints unscathed, but then the parser
fails with
ERROR: column "b.agg" must appear in the GROUP BY clause or be used in an aggregate function
LINE 7: SELECT agg FROM b;
^
because we've marked the outer query level with p_hasAggs = true.
(This seems like sufficient proof that we're assigning the aggregate
to the wrong level, if you were doubting that conclusion.)
The submitted problem query is shaped a little differently, though.
It's more nearly
WITH a AS (
SELECT id FROM (VALUES (1), (2)) AS v(id)
),
b AS (
SELECT max((SELECT sum(id) FROM a)) AS agg
)
SELECT (SELECT agg FROM b);
and that extra level of sub-select avoids the aforesaid error message,
since the sub-select where the "agg" reference is doesn't get marked
p_hasAggs. (I wonder whether the OP introduced that extra sub-select
in trying to work around this bug.)
Rather remarkably, this formulation actually gets the expected answer
"3", although if you look at the generated plan it's fairly wacko:
postgres=# explain (verbose, costs off) WITH a AS (
SELECT id FROM (VALUES (1), (2)) AS v(id)
),
b AS (
SELECT max((SELECT sum(id) FROM a)) AS agg
)
SELECT (SELECT agg FROM b);
QUERY PLAN
-----------------------------------------------
Aggregate
Output: (SubPlan expr_1)
InitPlan expr_2
-> Aggregate
Output: sum("*VALUES*".column1)
-> Values Scan on "*VALUES*"
Output: "*VALUES*".column1
-> Result
SubPlan expr_1
-> Result
Output: max((InitPlan expr_2).col1)
(11 rows)
I don't quite understand how that works, since the max() call is
not within the upper Aggregate's expression trees: why isn't it
triggering "Aggref found in non-Agg plan node"? It's probably not
worth figuring out though; this is all garbage-in-garbage-out behavior
so far as the planner is concerned. Variants of this, such as
inserting MATERIALIZED for "b", lead to planner failures like
ERROR: unexpected outer reference in CTE query
which is similar to the original report.
Anyway, the point is that we are assigning the wrong semantic level
to the max() aggregate, and all the rest of this behavior is just
follow-on effects of that.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2025-11-11 20:27:59 | Re: BUG #19106: Potential regression with CTE materialization planning in Postgres 18 |
| Previous Message | Vik Fearing | 2025-11-11 16:53:45 | Re: BUG #19106: Potential regression with CTE materialization planning in Postgres 18 |