WIP: patch to create explicit support for semi and anti joins

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: WIP: patch to create explicit support for semi and anti joins
Date: 2008-08-14 00:31:59
Message-ID: 18164.1218673919@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here's a snapshot of my current work on improving IN/EXISTS support,
pursuant to the discussion here:
http://archives.postgresql.org/pgsql-hackers/2008-08/msg00347.php
I'm posting it because it's a pretty large patch already, and because
I wanted to get comments on the next step I'm planning.

What's done:

Introduce JOIN_SEMI and JOIN_ANTI join types, the former replacing
JOIN_IN. Unify the InClauseInfo and OuterJoinInfo infrastructure into
"SpecialJoinInfo". Convert IN, EXISTS, and NOT EXISTS clauses at top
level of WHERE into semi and anti joins respectively. Recognize
LEFT JOIN with a suitable IS NULL filter condition as an anti join.
This all compiles and passes the regression tests.

What's not done:

nodeMergejoin.c doesn't yet handle JOIN_ANTI. (This is just a SMOP,
but it's a lot more complicated than the nestloop or hash logic, and
I figured nestloop and hash were enough for testing the planner.)
In the meantime, joinpath.c contains some crude hacks to prevent a merge
antijoin path from being created.

Selectivity estimation for semijoins and antijoins is still mostly bogus.

What I have in mind to do to fix selectivity estimation is to pass around
the SpecialJoinInfo struct associated with any join operation other than
a plain-vanilla INNER join. (Unifying InClauseInfo and OuterJoinInfo has
ensured that there always is one.) This is happening in the core planner
already, as of the attached patch. But in order to get anything useful
done it has to be made available to operator join selectivity estimation
functions too --- in particular, eqjoinsel and neqjoinsel, which have
been limping along for years with the knowledge that they didn't know
enough about their context. Having the SpecialJoinInfo will let them
know not only that they are estimating for a nonstandard join, but
which side of the join is doing what.

For backwards compatibility, I think we should allow oprjoin functions
to have either the signature (internal, oid, internal, smallint)
(really PlannerInfo *, Oid, List *, JoinType) or the signature
(internal, oid, internal, smallint, internal)
(really PlannerInfo *, Oid, List *, JoinType, SpecialJoinInfo *).
The former is the old style and will allow us to not break user-defined
selectivity functions in 8.4. But we should convert all the built-in
oprjoin functions to the new style.

Comments?

Barring objections, I'm thinking of committing what I have so far
and then starting to work on the selectivity fixes as a separate patch.

regards, tom lane

Attachment Content-Type Size
semi-anti-joins-1.patch.gz application/octet-stream 52.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-08-14 00:44:00 Re: SeqScan costs
Previous Message Gregory Stark 2008-08-14 00:01:29 Re: SeqScan costs