Removing INNER JOINs

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Removing INNER JOINs
Date: 2014-11-29 09:29:29
Message-ID: CAApHDvocUEYdt1uT+DLDPs2xEu=v3qJGT6HeXKonQM4rY_OsSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Starting a new thread which continues on from
http://www.postgresql.org/message-id/CAApHDvoeC8YGWoahVSri-84eN2k0TnH6GPXp1K59y9juC1WWBg@mail.gmail.com

To give a brief summary for any new readers:

The attached patch allows for INNER JOINed relations to be removed from the
plan,
providing none of the columns are used for anything, and a foreign key
exists which
proves that a record must exist in the table being removed which matches
the join
condition:

Example:

test=# create table b (id int primary key);
CREATE TABLE
test=# create table a (id int primary key, b_id int not null references
b(id));
CREATE TABLE
test=# explain (costs off) select a.* from a inner join b on a.b_id = b.id;
QUERY PLAN
---------------
Seq Scan on a

This has worked for a few years now for LEFT JOINs, so this patch is just
extending
the joins types that can be removed.

This optimisation should prove to be quite useful for views in which a
subset of its
columns are queried.

The attached is an updated patch which fixes a conflict from a recent
commit and
also fixes a bug where join removals did not properly work for PREPAREd
statements.

I'm looking for a bit of feedback around the method I'm using to prune the
redundant
plan nodes out of the plan tree at executor startup. Particularly around
not stripping
the Sort nodes out from below a merge join, even if the sort order is no
longer
required due to the merge join node being removed. This potentially could
leave
the plan suboptimal when compared to a plan that the planner could generate
when the removed relation was never asked for in the first place.

There are also other cases such as MergeJoins performing btree index scans
in order to obtain ordered results for a MergeJoin that would be better
executed
as a SeqScan when the MergeJoin can be removed.

Perhaps some costs could be adjusted at planning time when there's a
possibility
that joins could be removed at execution time, although I'm not quite sure
about
this as it risks generating a poor plan in the case when the joins cannot
be removed.

I currently can't see much of a way around these cases, but in both cases
removing
the join should prove to be a win, though just perhaps not with the most
optimal of
plans.

There are some more details around the reasons behind doing this weird
executor
startup plan pruning around here:

http://www.postgresql.org/message-id/20141006145957.GA20577@awork2.anarazel.de

Comments are most welcome

Regards

David Rowley

Attachment Content-Type Size
inner_join_removals_2014-11-29_be69869.patch application/octet-stream 60.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2014-11-29 10:54:04 Re: How about a option to disable autovacuum cancellation on lock conflict?
Previous Message Andres Freund 2014-11-29 08:22:52 How about a option to disable autovacuum cancellation on lock conflict?