Patch to support SEMI and ANTI join removal

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Patch to support SEMI and ANTI join removal
Date: 2014-08-05 10:35:45
Message-ID: CAApHDvpCBEfuc5tD=vniepAv0pU5m=q=fOQZcOdMHeei7OQPgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've been working away at allowing semi and anti joins to be added to the
list of join types that our join removal code supports.

The basic idea is that we can removal a semi or anti join if the left hand
relation references the relation that's being semi/anti joined if the join
condition matches a foreign key definition on the left hand relation.

To give an example:

Given the 2 tables:
create table t2 (t1_id int primary key);
create table t1 (value int references t2);

The join to t2 would not be required in:

select * from t1 where value in(select t1_id from t2);

Neither would it be here:

select * from t1 where not exists(select 1 from t2 where t1_id=value);

To give a bit of background, I initially proposed the idea here:

http://www.postgresql.org/message-id/CAApHDvq0NAi8cEqTNNdqG6mhFH__7_A6Tn9XU4V0cut9wab4gA@mail.gmail.com

And some issues were raised around the fact that updates to the referenced
relation would only flush out changes to the referencing tables on
completion of the command, and if we happened to be planning a query that
was located inside a volatile function then we wouldn't know that the
parent query hadn't updated some of these referenced tables.

Noah raised this concern here:
http://www.postgresql.org/message-id/20140603235053.GA351732@tornado.leadboat.com
But proposed a solution here:
http://www.postgresql.org/message-id/20140605000407.GA390318@tornado.leadboat.com

In the attached I've used Noah's solution to the problem, and it seems to
work just fine. (See regression test in the attached patch)

Tom raised a point here:
http://www.postgresql.org/message-id/19326.1401891282@sss.pgh.pa.us

Where he mentioned that it may be possible that the foreign key trigger
queue gets added to after planning has taken place.
I've spent some time looking into this and I've not yet managed to find a
case where this matters as it seems that updates made in 1 command are not
visible to that same command. I've tested various different test cases in
all transaction isolation levels and also tested update commands which call
volatile functions that perform updates in the same table that the outer
update will reach later in the command.

The patch (attached) is also now able to detect when a NOT EXISTS clause
cannot produce any records at all.

If I make a simple change to the tables I defined above:

ALTER TABLE t1 ALTER COLUMN value SET NOT NULL;

Then the following will be produced:

explain (costs off) select * from t1 where not exists(select 1 from t2
where t1_id=value);
QUERY PLAN
--------------------------
Result
One-Time Filter: false
-> Seq Scan on t1

A small note on my intentions with this patch:

I'm not seeing the use case for all of this to be massive, I'm more
interested in this patch to use it as a stepping stone towards implementing
INNER JOIN removals which would use foreign keys in a similar way to
attempt to prove that the join is not required. I decided to tackle semi
and anti joins first as these are a far more simple case, and it also adds
quite a bit of the infrastructure that would be required for inner join
removal, plus if nobody manages to poke holes in my ideas with this then I
should have good grounds to begin the work on the inner join removal code.
I also think if we're bothering to load foreign key constraints at planning
time, then only using them for inner join removals wouldn't be making full
use of them, so likely this patch would be a good idea anyway.

Currently most of my changes are in analyzejoin.c, but I did also have to
make changes to load the foreign key constraints so that they were
available to the planner. One thing that is currently lacking, which would
likely be needed, before the finished patch is ready, would be a
"relhasfkeys" column in pg_class. Such a column would mean that it would be
possible to skip scanning pg_constraint for foreign keys when there's none
to find. I'll delay implementing that until I get a bit more feedback to
weather this patch would be a welcome addition to the existing join removal
code or not.

I'm submitting this (a little early) for the August commitfest, but if
anyone has any time to glance at it before then then that would be a really
good help.

Regards

David Rowley

Attachment Content-Type Size
semianti_join_removal_2410c7c_2014-08-05.patch application/octet-stream 51.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message testman1316 2014-08-05 10:53:33 Re: PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?
Previous Message Marti Raudsepp 2014-08-05 10:16:32 Re: PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?