Improving planning of outer joins

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Improving planning of outer joins
Date: 2005-12-15 02:27:36
Message-ID: 8802.1134613656@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've spent some time looking into how we can improve our planning of outer
joins. The current planner code slavishly follows the syntactic join
order, which can lead to quite bad plans. The reason it does this is that
in some cases altering the join order of outer joins can change the
results. However, there are many cases where the results would not be
changed, and we really need to start taking advantage of those cases.

After some review of the literature, I've found out that these identities
always hold:
(A leftjoin B on (Pab)) innerjoin C on (Pac)
= (A innerjoin C on (Pac)) leftjoin B on (Pab)
(where Pac is a predicate not referencing B, else of course the
transformation is nonsensical)
(A leftjoin B on (Pab)) leftjoin C on (Pac)
= (A leftjoin C on (Pac)) leftjoin B on (Pab)
Also,
(A leftjoin B on (Pab)) leftjoin C on (Pbc)
= A leftjoin (B leftjoin C on (Pbc)) on (Pab)
if predicate Pbc must fail for all-null B rows.

The case that does *not* work is moving an innerjoin into or out of the
nullable side of an outer join:
A leftjoin (B join C on (Pbc)) on (Pab)
!= (A leftjoin B on (Pab)) join C on (Pbc)

There is some stuff in the literature about how to make transformations
of the last kind, but it requires additional executor smarts to do strange
sorts of "generalized outer join" operations. I'm disinclined to tackle
that for now, partly because it seems like a lot of work for uncertain
payback (the first three cases cover most of the practical examples I've
looked at), and partly because I'm not too sure about the patent situation
for these things. The three identities shown above have been known since
the late '80s and should be safe enough to work with.

What I'm thinking of doing is changing the planner so that a left or right
join node is broken apart into two separate relations (effectively
FROM-list items) that participate in the normal join-order search process,
subject to these limitations:
* A regular join within the nullable side (the right side of a left join,
etc) remains indivisible and must be planned separately. In other
cases we recursively break down sub-joins into separate relations.
* The nullable side is marked to show that its first join must be to a set
of relations including at least those relations on the outer side that
are referenced in the outer-join condition. (This compares to the
current state of affairs, which is that its first join is always to
exactly the relation(s) on the outer side.) If the outer-join condition
isn't strict for these outer-side relations, however, we have to punt
and mark the nullable side as requiring all of the outer-side relations
for its first join (this covers the case where the third identity above
doesn't hold).

Full joins are not covered by this and will continue to be planned as now.

I'm not sure whether we'd need any additional planner knobs to control
this. I think that the existing join_collapse_limit GUC variable should
continue to exist, but its effect on left/right joins will be the same as
for inner joins. If anyone wants to force join order for outer joins more
than for inner joins, we'd need some other control setting, but I don't
currently see why that would be very useful.

Does this seem like a reasonable agenda, or am I thinking too small?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Premsun Choltanwanich 2005-12-15 02:36:51 Re: lo function changed in PostgreSQL 8.1.1
Previous Message Christopher Kings-Lynne 2005-12-15 01:52:24 Re: Refactoring psql for backward-compatibility