| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
|---|---|
| To: | pgsql-hackers(at)postgreSQL(dot)org | 
| Subject: | Analyzing bug 8049 | 
| Date: | 2013-04-11 17:25:42 | 
| Message-ID: | 6546.1365701142@sss.pgh.pa.us | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
I looked into the problem reported at
http://www.postgresql.org/message-id/E1UPa3B-0004K0-PJ@wrigleys.postgresql.org
which is that we're deriving a bogus plan for this query:
SELECT * FROM
(
  SELECT (COALESCE(h_n || '/', '') || l_n)::text AS fault
  FROM
  (
	SELECT _bug_header.h_n, _bug_line.l_n
	FROM _bug_line
	LEFT OUTER JOIN _bug_header on (_bug_line.h_n = _bug_header.h_n)
  ) AS tmp
) AS tmp2
WHERE (lower(fault) = '1')
ORDER BY lower(fault);
With use of a nestloop forced, we get this plan:
 Nested Loop Left Join  (cost=0.16..513.11 rows=11 width=8)
   ->  Seq Scan on _bug_line  (cost=0.00..31.40 rows=2140 width=8)
   ->  Index Only Scan using _bug_header_unique on _bug_header  (cost=0.16..0.21 rows=1 width=4)
         Index Cond: (h_n = _bug_line.h_n)
         Filter: (lower((COALESCE(((h_n)::text || '/'::text), ''::text) || (_bug_line.l_n)::text)) = '1'::text)
in which the problem is pretty obvious: the upper WHERE condition has been
pushed to the scan of _bug_header, at which level we get the wrong answers
from the COALESCE because the reference to _bug_header.h_n doesn't go to
NULL when it should do so because of the outer join.
Curiously, this doesn't happen without the ORDER BY.  (If you're wondering
why there's no sort step in this plan, it's because the planner figures
out that the equality constraint on lower(fault) renders the ORDER BY a
no-op.  But the damage has already been done by then.)
In HEAD, this happens because join_clause_is_movable_to() thinks it's safe
to move the join clause (lower(fault) = '1') to be evaluated inside the
nestloop.  Now, that function is perfectly aware of this type of hazard,
so it's faithfully checking the clause's nullable_relids to make sure it
doesn't push a clause from above an outer join into the RHS of the join.
However, it's seeing NULL as the value of the clause's nullable_relids.
Ooops.  And the reason for that is that what it's seeing isn't actually
the original WHERE clause, but a reconstructed version that's been
regurgitated by the EquivalenceClass machinery.  And that machinery is
failing to stamp the constructed clause with the right nullable_relids.
I thought I'd fixed that type of problem last fall, in commit
72a4231f0c80f213a4fa75356dc3c6b7c7419059, which added code to attempt
to compute those relid sets at need.  But there was an oversight in
that patch: get_eclass_for_sort_expr() always sets nullable_relids to
NULL in any new equivalence-class members that it creates to represent
sort keys.  In this example, it's obviously bogus for the ORDER BY
key "lower(fault)" to be considered as having no nullable_relids, because
it's above an outer join that can null _bug_header.h_n.  This explains
why the ORDER BY clause is needed for the bug to manifest --- if the
equivalence-class member for "lower(fault)" is created as a result of
processing the WHERE clause, then it's all good because the member is
created with the right em_nullable_relids.
So, okay, we need to fix get_eclass_for_sort_expr() to compute valid
nullable_relids for any sort clauses it's processing.  This is a bit
problematic because the information isn't readily available to it,
but we can certainly add some code to compute that.
However, reflecting on this example convinces me that there's a bigger
problem: the equivalence-class machinery assumes that textually equivalent
expressions (more formally, equal() expression trees) mean the same
thing.  They don't necessarily.  In this example, _bug_header.h_n means
something different below the outer join than it does above it.  We have
hacked around that problem in the past via various ad-hoc rules involving
not treating equality constraints as true equivalences if they were
affected by outer joins.  But that approach was always pretty squishy,
and I've wished for some more principled way to define the rules.
I now think I see one: to consider two expressions equivalent, we ought
to consider not only textual equality, but whether they are subject to
the effects of the same outer joins, which we could represent as the
relevant nullable_relids sets.  So basically this would mean changing
equivclass.c to consider the identity of an EquivalenceMember as defined
by em_expr plus em_nullable_relids, not just em_expr.  Having done that
I think we could remove all the bizarre code about below_outer_join and
so forth, which is both of dubious correctness and an obstacle to
complete optimization of queries.
This idea needs more fleshing out, but it's seeming awfully attractive
right now.  The big problem with it is that it's going to be a more
invasive patch than I feel terribly comfortable about back-patching.
However, I'm not sure there's much choice, because I don't see any narrow
fix for 9.2 that would not result in very substantial degradation of its
optimization ability.  We can't just lobotomize equivalence-class
processing.
The plan I'm considering is to get this written and committed to HEAD
in the next week, so that it can go out in 9.3beta1.  After the patch
has survived a reasonable amount of beta testing, I'd be more comfortable
about back-patching into 9.2.
BTW, I am also suspicious that there are related examples that fail
pre-9.2, though I don't have one yet.  I previously back-patched
72a4231f0c80f213a4fa75356dc3c6b7c7419059 into all the older branches,
so I am wondering if we might need to back-patch this rewrite further
than 9.2 as well.
Comments?
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Fujii Masao | 2013-04-11 17:29:01 | Re: Inconsistent DB data in Streaming Replication | 
| Previous Message | Fujii Masao | 2013-04-11 17:18:42 | Re: Inconsistent DB data in Streaming Replication |