Re: Patch to support SEMI and ANTI join removal

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jim Nasby <jim(at)nasby(dot)net>
Subject: Re: Patch to support SEMI and ANTI join removal
Date: 2014-11-23 11:23:45
Message-ID: CAApHDvoeC8YGWoahVSri-84eN2k0TnH6GPXp1K59y9juC1WWBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 19, 2014 at 11:49 PM, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On Sun, Nov 16, 2014 at 12:19 PM, David Rowley <dgrowleyml(at)gmail(dot)com>
> wrote:
>
>> On Sun, Nov 16, 2014 at 10:09 AM, Simon Riggs <simon(at)2ndquadrant(dot)com>
>> wrote:
>>
>>>
>>> I propose that we keep track of whether there are any potentially
>>> skippable joins at the top of the plan. When we begin execution we do
>>> a single if test to see if there is run-time work to do. If we pass
>>> the run-time tests we then descend the tree and prune the plan to
>>> completely remove unnecessary nodes. We end with an EXPLAIN and
>>> EXPLAIN ANALYZE that looks like this
>>>
>>> > QUERY PLAN
>>> > ------------------------------------------------------------------
>>> > Aggregate (actual rows=1 loops=1)
>>> > -> Seq Scan on t1 (actual rows=1000000 loops=1)
>>>
>>> Doing that removes all the overheads and complexity; it also matches
>>> how join removal currently works.
>>>
>>>
>>
>>
I've attached an updated patch which works in this way. All of the skipping
code that I had added to the executor's join functions has now been removed.

Here's an example output with the plan trimmed, and then untrimmed.

set constraints b_c_id_fkey deferred;
explain (costs off) select b.* from b inner join c on b.c_id = c.id;
QUERY PLAN
---------------
Seq Scan on b
(1 row)

-- add a item to the trigger queue by updating a referenced record.
update c set id = 2 where id=1;
explain (costs off) select b.* from b inner join c on b.c_id = c.id;
QUERY PLAN
------------------------------
Hash Join
Hash Cond: (b.c_id = c.id)
-> Seq Scan on b
-> Hash
-> Seq Scan on c
(5 rows)

A slight quirk with the patch as it stands is that I'm unconditionally NOT
removing Sort nodes that sit below a MergeJoin node. The reason for this is
that I've not quite figured out a way to determine if the Sort order is
required still.

An example of this can be seen in the regression tests:

-- check merge join nodes are removed properly
set enable_hashjoin = off;
-- this should remove joins to b and c.
explain (costs off)
select COUNT(*) from a inner join b on a.b_id = b.id left join c on a.id =
c.id;
QUERY PLAN
---------------------------
Aggregate
-> Sort
Sort Key: a.b_id
-> Seq Scan on a
(4 rows)

As the patch stands there's still a couple of FIXMEs in there, so there's
still a bit of work to do yet.

Comments are welcome

Regards

David Rowley

Attachment Content-Type Size
inner_join_removals_2014-11-24_7cde1e4.patch application/octet-stream 58.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vladimir Koković 2014-11-23 18:37:58 make check-world regress failed
Previous Message Amit Kapila 2014-11-23 06:45:00 Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]