Re: 9.2rc1 produces incorrect results

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Vik Reykja <vikreykja(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.2rc1 produces incorrect results
Date: 2012-09-04 18:52:34
Message-ID: 23021.1346784754@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Vik Reykja <vikreykja(at)gmail(dot)com> writes:
>> Hello. It took me a while to get a version of this that was independent of
>> my data, but here it is. I don't understand what's going wrong but if you
>> change any part of this query (or at least any part I tried), the correct
>> result is returned.

> Huh. 9.1 gets the wrong answer too, so this isn't a (very) new bug;
> but curiously, 8.4 and 9.0 seem to get it right. I think this is
> probably related somehow to Adam Mackler's recent report --- multiple
> scans of the same CTE seems to be a bit of a soft spot :-(

No, I'm mistaken: it's a planner bug. The plan looks like this:

QUERY PLAN
-------------------------------------------------------------------------------------------
Result (cost=281.29..290.80 rows=20 width=36)
CTE a
-> Nested Loop (cost=126.96..280.17 rows=19 width=4)
-> Merge Right Join (cost=126.96..220.17 rows=19 width=4)
Merge Cond: (((pg_c.relname)::text) = ((t2.id)::text))
Filter: (pg_c.oid IS NULL)
-> Sort (cost=61.86..63.72 rows=743 width=68)
Sort Key: ((pg_c.relname)::text)
-> Seq Scan on pg_class pg_c (cost=0.00..26.43 rows=743 width=68)
-> Sort (cost=65.10..67.61 rows=1004 width=4)
Sort Key: ((t2.id)::text)
-> Seq Scan on t2 (cost=0.00..15.04 rows=1004 width=4)
-> Index Scan using t1_pkey on t1 (cost=0.00..3.14 rows=1 width=4)
Index Cond: (id = t2.id***)
CTE b
-> WindowAgg (cost=0.78..1.12 rows=19 width=4)
-> Sort (cost=0.78..0.83 rows=19 width=4)
Sort Key: a.id
-> CTE Scan on a (cost=0.00..0.38 rows=19 width=4)
-> Append (cost=0.00..9.51 rows=20 width=36)
-> CTE Scan on a (cost=0.00..4.66 rows=10 width=4)
Filter: is_something
SubPlan 3
-> CTE Scan on b (cost=0.00..0.43 rows=1 width=4)
Filter: (id = a.id***)
-> CTE Scan on a (cost=0.00..4.66 rows=10 width=4)
Filter: is_something
SubPlan 4
-> CTE Scan on b (cost=0.00..0.43 rows=1 width=4)
Filter: (id = a.id***)
(30 rows)

The planner is assigning a single PARAM_EXEC slot for the parameter
passed into the inner indexscan in "CTE a" and for the parameters passed
into the two SubPlans that scan "CTE b" (the items I marked with "***"
above). This is safe enough so far as the two SubPlans are concerned,
because they don't execute concurrently --- but when SubPlan 3 is first
fired, it causes the remainder of CTE a to be computed, so that the
nestloop gets iterated some more times, and that overwrites the value of
"a.id" that already got passed down into SubPlan 3.

The reason this is so hard to replicate is that the PARAM_EXEC slot can
only get re-used for identical-looking Vars (same varno, varlevelsup,
vartype, etc) --- so even granted that you've got the right shape of
plan, minor "unrelated" changes in the query can stop the aliasing
from occurring. Also, inner indexscans weren't using the PARAM_EXEC
mechanism until 9.1, so that's why the example doesn't fail before 9.1.

I've always been a bit suspicious of the theory espoused in
replace_outer_var that aliasing different Params is OK:

* NOTE: in sufficiently complex querytrees, it is possible for the same
* varno/abslevel to refer to different RTEs in different parts of the
* parsetree, so that different fields might end up sharing the same Param
* number. As long as we check the vartype/typmod as well, I believe that
* this sort of aliasing will cause no trouble. The correct field should
* get stored into the Param slot at execution in each part of the tree.

but I've never seen a provably wrong case before. Most likely, this has
been broken since we introduced CTEs in 8.4: it's the decoupled timing
of execution of main query and CTE that's needed to allow execution of
different parts of the plan tree to overlap and thus create the risk.

(I get the impression that only recently have people been writing really
complex CTE queries, since we found another fundamental bug with them
just recently.)

I think probably the best fix is to rejigger things so that Params
assigned by different executions of SS_replace_correlation_vars and
createplan.c can't share PARAM_EXEC numbers. This will result in
rather larger ecxt_param_exec_vals arrays at runtime, but the array
entries aren't very large, so I don't think it'll matter.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2012-09-04 19:09:04 pg_upgrade diffs on WIndows
Previous Message Andrew Dunstan 2012-09-04 18:25:27 Re: pg_upgrade del/rmdir path fix