Proposal for supporting outer joins in 7.1

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Proposal for supporting outer joins in 7.1
Date: 2000-08-25 20:07:02
Message-ID: 4429.967234022@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I would like to see if we can't get some amount of OUTER JOIN functionality
into the 7.1 release. It seems that the executor implementation should be
pretty simple, at least for the left-join case (which is what people appear
to need the most). What we are mostly missing is a suitable parsetree
representation plus appropriate handling in the planner.

Right now, the rangetable associated with a query serves two purposes.
First, it provides a list of all relations used in the query for Var nodes
to refer to (a Var's "varno" field is just an index into the rangetable
list). Second, it drives the planner's construction of the join tree for
the query (all entries that are marked inJoinSet will be added to the join
tree in some order). I think what we want to do is separate these two
functions into two data structures. The rangetable list is just fine for
looking up Var references, but it's not sufficient information for
representing multiple types of joins. A ready-to-plan Query node should
have an additional subsidiary data structure that describes the required
join tree of relations.

The additional data structure could really be pretty close to the raw parse
structure that the grammar emits for FROM clauses. I envision it as a tree
whose leaf nodes are references to rangetable entries and whose upper-level
nodes correspond to JOIN clauses (each JOIN node has pointers to its child
nodes plus information about the specified join type and any supplied
constraint conditions). If the FROM clause has more than one item, the
top level of the join tree is a CROSS JOIN node with all the FROM-clause
items as children. (Note that at least for this case, a join node needs
to be able to have more than two children.)

If anything, this structure should be easier for the parser/analyzer to emit
than what it's doing now. Notice in particular that ON/USING qual clauses
associated with joins are to be left in the join tree, *not* merged into the
main WHERE clause. (The planner would just have to separate them out again
and discover which join they should go with, so what's the point of merging
them into WHERE?)

I envision the planner as operating by walking this join tree and
considering plans that perform each specified join. At nodes with more than
two children (such as the top-level CROSS JOIN from a many-element FROM
clause) the planner will consider all possible orders for joining the
children together, the same as it does now. Note that this implementation
implies that the user can guide/constrain the planner by writing an
appropriate FROM clause. For example:

SELECT ... FROM a,b,c
Planner will consider all three possible join orders:
(a cross join b) cross join c
(a cross join c) cross join b
(b cross join c) cross join a

SELECT ... FROM (a cross join b) cross join c
Planner will only consider joining a with b and then adding c.

This might be construed as either a feature or a bug depending on your
views about how much responsibility to put on the planner vs. the user.
In any case it seems like a reasonable first-cut implementation; we can
think about improvements later.

Thoughts, objections, better ideas?

One thing I'm not very clear on is how much has been done in the parser
about the ISO rules for visibility and aliasing of table columns in
explicit JOINs. Thomas, can you sketch out where we stand in that area?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message nataraj 2000-08-25 20:46:35 Re: postgres 7.0.2
Previous Message Rini Dutta 2000-08-25 19:20:59 queries and inserts