Re: Patch to support SEMI and ANTI join removal

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch to support SEMI and ANTI join removal
Date: 2014-09-11 11:14:33
Message-ID: CAApHDvpDXXvKE+=ug1kA--nKfa=bjrjvK8Gp9G8UYwv6nHckVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 28, 2014 at 6:23 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> If the majority of the added code is code that will be needed for
> less-bogus optimizations, it might be all right; but I'd kind of want to
> see the less-bogus optimizations working first.
>
>
That seems fair. Likely there'd be not a great deal of value to semi and
anti joins removal alone. I was more just trying to spread weight of an
inner join removal patch...

So In response to this, I've gone off and written an inner join removal
support patch (attached), which turned out to be a bit less complex than I
thought.

Here's a quick demo, of the patch at work:

test=# create table c (id int primary key);
CREATE TABLE
test=# create table b (id int primary key, c_id int not null references
c(id));
CREATE TABLE
test=# create table a (id int primary key, b_id int not null references
b(id));
CREATE TABLE
test=#
test=# explain select a.* from a inner join b on a.b_id = b.id inner join c
on b.c_id = c.id;
QUERY PLAN
-----------------------------------------------------
Seq Scan on a (cost=0.00..31.40 rows=2140 width=8)
Planning time: 1.061 ms
(2 rows)

Perhaps not a greatly useful example, but if you can imagine the joins are
hidden in a view and the user is just requesting a small subset of columns,
then it does seem quite powerful.

There's currently a few things with the patch that I'll list below, which
may raise a few questions:

1. I don't think that I'm currently handling removing eclass members
properly. So far the code just removes the Vars that belong to the relation
being removed. I likely should also be doing bms_del_member(ec->ec_relids,
relid); on the eclass, but perhaps I should just be marking the whole class
as "ec_broken = true" and adding another eclass all everything that the
broken one has minus the parts from the removed relation?

2. Currently the inner join removal is dis-allowed if the (would be)
removal relation has *any* baserestrictinfo items. The reason for this is
that we must ensure that the inner join gives us exactly 1 row match on the
join condition, but a baserestrictinfo can void the proof that a foreign
key would give us that a matching row does exist. However there is an
exception to this that could allow that restriction to be relaxed. That is
if the qual in baserestrictinfo use vars that are in an eclass, where the
same eclass also has ec members vars that belong to the rel that we're
using the foreign key for to prove the relation not needed.... umm.. that's
probably better described by example:

Assume there's a foreign key a (x) reference b(x)

SELECT a.* FROM a INNER JOIN b ON a.x = b.x WHERE b.x = 1

relation b should be removable because an eclass will contain {a.x, b.x}
and therefore s baserestrictinfo for a.x = 1 should also exist on relation
a. Therefore removing relation b should produce equivalent results, i.e
everything that gets filtered out on relation b will also be filtered out
on relation a anyway.

I think the patch without this is still worth it, but if someone feels
strongly about it I'll take a bash at supporting it.

3. Currently the inner join support does not allow removals using foreign
keys which contain duplicate columns on the referencing side. e.g (a,a)
references (x,y), this is basically because of the point I made in item 2.
In this case a baserestrictinfo would exist on the referenced relation to
say WHERE x = y. I'd have to remove the restriction described in item 2 and
do a small change to the code that extracts the join condition from the
eclass for this to work. But it's likely a corner case that's not worth too
much trouble to support. I think probably if I saw an FK like that in the
field, I'd probably scratch my head for a while, while trying to
understanding why they bothered.

4. The patch currently only allows removals for eclass join types. If the
rel has any joininfo items, then the join removal is disallowed. From what
I can see equality type inner join conditions get described in eclasses,
and only non-equality join conditions make it into the joininfo list, and
since foreign keys only support equality operators, then I thought this was
a valid restriction, however, if someone can show me a flaw in my
assumption then I may need to improve this.

5. I've added a flag to pg_class called relhasfkey. Currently this gets set
to true when a foreign key is added, though I've added nothing to set it
back to false again. I notice that relhasindex gets set back to false
during vacuum, if vacuum happens to find there to not be any indexes on the
rel. I didn't put my logic here as I wasn't too sure if scanning
pg_constraint during a vacuum seemed very correct, so I just left out the
"setting it to false" logic based on the the fact that I noticed that
relhaspkey gets away with quite lazy setting back to false logic (only when
there's no indexes of any kind left on the relation at all).

The only think else I can think of is perhaps optimising a little. I was
thinking likely most queries wont benefit from this too much, so I was
thinking of adding some logic to skip all join removals by pulling out the
varnos from the target list entries and skipping even attempting to perform
a join removal for a relation that has its varno in the targetlist of the
query. Though perhaps a few benchmarks will determine if this is worth it
or not.

Comments are welcome. -- I'm really hoping this patch generates a bit more
interest than the SEMI/ANTI join removal one!

Regards

David Rowley

Attachment Content-Type Size
inner_join_removals_2014-09-11_38cf71c.patch application/octet-stream 60.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2014-09-11 11:19:25 Re: Support for N synchronous standby servers
Previous Message Andres Freund 2014-09-11 11:01:43 Re: Scaling shared buffer eviction