Re: Patch to support SEMI and ANTI join removal

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jim Nasby <jim(at)nasby(dot)net>
Subject: Re: Patch to support SEMI and ANTI join removal
Date: 2014-11-19 10:49:08
Message-ID: CAApHDvr0qO4NQKGAO-DjDyzvvG-KpNc41sRORjRcfzHKfLe66w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 16, 2014 at 12:19 PM, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On Sun, Nov 16, 2014 at 10:09 AM, Simon Riggs <simon(at)2ndquadrant(dot)com>
> wrote:
>
>>
>> I propose that we keep track of whether there are any potentially
>> skippable joins at the top of the plan. When we begin execution we do
>> a single if test to see if there is run-time work to do. If we pass
>> the run-time tests we then descend the tree and prune the plan to
>> completely remove unnecessary nodes. We end with an EXPLAIN and
>> EXPLAIN ANALYZE that looks like this
>>
>> > QUERY PLAN
>> > ------------------------------------------------------------------
>> > Aggregate (actual rows=1 loops=1)
>> > -> Seq Scan on t1 (actual rows=1000000 loops=1)
>>
>> Doing that removes all the overheads and complexity; it also matches
>> how join removal currently works.
>>
>>
> This sounds much cleaner than what I have at the moment, although, you say
> EXPLAIN would look like that... I don't think that's quite true as the
> EXPLAIN still would have the un-pruned version, as the pruning would be
> done as executor start-up. Would it cause problems to have the EXPLAIN have
> a different looking plan than EXPLAIN ANALYZE?
>
>
>
Oops, It seems you're right about the EXPLAIN output. I had not previously
realised that plain old EXPLAIN would initialise the plan. It's nice to see
that I'll get my old tests working again!

I've been hacking away at this, and I've now got a function which
"implodes" the plan down to just what is required, I'm just calling this
function is there are no pending foreign key triggers.

Writing this has made me realise that I may need to remove the
functionality that I've added to the planner which, after it removes 1
inner join, it puts that relation in an "ignore list" and tries again to
remove other relations again, but this time ignoring any vars from ignored
relations. The problem I see with this is that, with a plan such as:

Hash Join
Hash Cond: (t1.id = t4.id)
-> Hash Join
Hash Cond: (t1.id = t3.id)
-> Hash Join
Hash Cond: (t1.id = t2.id)
-> Seq Scan on t1
-> Hash
-> Seq Scan on t2
-> Hash
-> Seq Scan on t3
-> Hash
-> Seq Scan on t4

If t1 and t4 are marked as "can remove", then the code that "implodes" plan
to remove the nodes which are no longer required would render the plan a
bit useless as there's no join between t2 and t3, we'd need to keep t1 in
this case, even though non of it's Vars are required. Perhaps I could fix
this by writing some more intelligent code which would leave joins in place
in this situation, and maybe I could coerce the planner into not producing
plans like this by lowering the costs of joins where 1 of the relations
could be removed. Andres did mention lowering costs previously, but at the
time I'd not realised why it was required.

I'm also a little concerned around Merge Joins, as if I removed a Merge
Join, because one of the relations was not required, and just left, say the
SeqScan node for the other relation in place of the Merge Join, then I'd
need to somehow check that none of the parent nodes were expecting some
specific sort order. Perhaps I could just always leave any Sort node in
place, if it existed, and just put the scan below that, but it all feels a
bit like executor performing voodoo on the plan... i.e. just feels like a
little bit more than the executor should know about plans. I'm a bit
worried that I could spend a week on this and Tom or someone else then
comes along and throws it out.

So I'm really just looking for some confirmation to if this is a good or
bad idea, based on the discoveries I've explained above. I really want to
see this stuff working, but at the same time don't want to waste time on it
if it's never going to be committed.

Regards

David Rowley

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2014-11-19 11:18:21 Re: postgres_fdw behaves oddly
Previous Message Fujii Masao 2014-11-19 10:12:31 Re: PostgreSQL doesn't stop propley when --slot option is specified with pg_receivexlog.