Re: 8.2 bug with outer join reordering

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 8.2 bug with outer join reordering
Date: 2006-12-06 22:20:41
Message-ID: 2918.1165443641@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

I wrote:
> I can't reproduce it with less than four tables, so it's a pretty
> weird corner case. Apparently there's something wrong with
> distribute_qual_to_rels' logic for determining qual placement, but
> I'm not sure what yet ...

OK, I've localized the problem and been able to reduce it to a
three-table example. Using the regression database:

select count(*) from
tenk1 a left join tenk1 b on (a.unique1=b.unique1)
left join onek c on (b.unique1=c.unique1)
where case c.ten when 1 then true else false end;

This should report 100, but in 8.2 it's reporting 10000, because the
WHERE condition is being pushed down below the outer join with "a".
(Note: you might think the WHERE should be just "c.ten = 1", but the
planner understands that that's strict and reduces all the left joins
to plain joins, hiding the bug. It's not bright enough to see the
CASE as strict, however.)

The problem comes in distribute_qual_to_rels() where it's trying to
decide the minimum set of relations that have to be joined before the
WHERE qual can be applied:

* For a non-outer-join qual, we can evaluate the qual as soon as (1)
* we have all the rels it mentions, and (2) we are at or above any
* outer joins that can null any of these rels and are below the
* syntactic location of the given qual. To enforce the latter, scan
* the oj_info_list and merge the required-relid sets of any such OJs
* into the clause's own reference list. At the time we are called,
* the oj_info_list contains only outer joins below this qual.

The oj_info_list contains two outer joins, respectively having

min_lefthand = a min_righthand = b

min_lefthand = b min_righthand = c

which (correctly) asserts than we can do either the a/b or b/c join
first ... but not a/c first. However, since the WHERE qual mentions
only c, distribute_qual_to_rels concludes that it can be applied as
soon as the b/c join is formed. And because of the relative sizes
of the tables, the planner prefers to do that join before the a/c one.

The reason the bug doesn't exist in 8.1 and older is that those releases
applied essentially this logic using the *syntactic* sets of relations
contained in different levels of outer join, and so a qual using only
(c) would have a and b added to it based on the syntax tree alone.

I think that this can be fixed by having the code iterate over the
oj_info_list repeatedly, until unioning adds no more rels. That is, in
the first pass we'd have (c) and we'd add b because c appears in the RHS
of the second OJ; then in the second pass we'd start with (b,c) and add
a because b appears in the RHS of the first OJ; then the third pass
would start with (a,b,c), find nothing to add, and be done. This is
appropriate because an OJ having b on its RHS must in fact have been
syntactically above the b/c join --- the only reason it doesn't have c
in its RHS now is that make_outerjoininfo() decided it was safe to
interchange the ordering of the two OJs. But a qual coming from above
both OJs can't be allowed to bubble down below either.

Comments? Anyone see any remaining holes in this logic?

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2006-12-06 22:42:47 Re: BUG #2812: Transaction is aborted after error
Previous Message Tom Lane 2006-12-06 20:59:32 Re: 8.2 bug with outer join reordering

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2006-12-06 22:27:36 Re: old synchronized scan patch
Previous Message ktm 2006-12-06 22:13:21 Re: Hi