Re: Removing INNER JOINs

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>, Mart Kelder <mart(at)kelder31(dot)nl>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Removing INNER JOINs
Date: 2015-03-16 09:55:13
Message-ID: CAApHDvroLRWYr+_yoihJ+6_2g_BghBwZDPmJgEg0nRYSANZ-Vw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15 January 2015 at 08:36, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:

> On 1/13/15 5:02 AM, David Rowley wrote:
>
>>
>> I can't quite get my head around what you mean here, as the idea sounds
>> quite similar to something that's been discussed already and ruled out.
>> If we're joining relation a to relation b, say the plan chosen is a merge
>> join. If we put some special node as the parent of the merge join then how
>> will we know to skip or not skip any sorts that are there especially for
>> the merge join, or perhaps the planner choose an index scan as a sorted
>> path, where now that the join is removed, could become a faster seqscan.
>> The whole plan switching node discussion came from this exact problem.
>> Nobody seemed to like the non-optimal plan that was not properly optimised
>> for having the relation removed.
>>
>
> In my mind this is the same as a root level Alternative Plan, so you'd be
> free to do whatever you wanted in the alternate:
>
> -> blah blah
> -> Alternate
> -> Merge join
> ...
> -> SeqScan
>
> I'm guessing this would be easier to code, but that's just a guess. The
> other advantage is if you can't eliminate the join to table A at runtime
> you could still eliminate table B, whereas a top-level Alternate node
> doesn't have that flexibility.
>
> This does have a disadvantage of creating more plan variations to
> consider. With a single top-level Alternate node there's only one other
> option. I believe multiple Alternates would create more options to consider.
>
> Ultimately, unless this is easier to code than a top-level alternate, it's
> probably not worth pursuing.
>
>
I think it's probably possible to do this, but I think it would require
calling make_one_rel() with every combination of each possibly removable
relations included and not included in the join list. I'm thinking this
could end up a lot of work as the number of calls to make_one_rel() would
be N^2, where N is the number of relations that may be removable.

My line of thought was more along the lines of that the backup/all purpose
plan will only be used in very rare cases. Either when a fk has been
deferred or if the query is being executed from within a volatile function
which has been called by an UPDATE statement which has just modified the
table causing a foreign key trigger to be queued. I'm willing to bet
someone does this somewhere in the world, but the query that's run would
also have to have a removable join. (One of the regression tests I've added
exercises this)

For that reason I thought it was best to generate only 2 plans. One with
*all* possible removable rels removed, and a backup one with nothing
removed which will be executed if there's any FK triggers queued up.

> It also seems that transitioning through needless nodes comes at a cost.
>> This is why I quite liked the Alternative Plan node idea, as it allowed me
>> to skip over the alternative plan node at plan initialisation.
>>
>
> For init I would expect this to result in a smaller number of nodes than a
> top-level Alternate, because you wouldn't be duplicating all the stuff
> above the joins. That said, I rather doubt it's worth worrying about the
> cost to init; won't it be completely swamped by extra planning cost, no
> matter how we go about this?
>
>
I'm not worried about the cost of the decision at plan init time. I was
just confused about what Tom was recommending. I couldn't quite decide from
his email if he meant to keep the alternative plan node around so that the
executor must transition through it for each row, or to just choose the
proper plan at executor init and return the actual root of the selected
plan instead of returning the initialised AlternativePlan node (see
nodeAlternativePlan.c)

The two ways of doing this have a massively different look in the EXPLAIN
output. With the method the patch currently implements only 1 of the 2
alternative plans are seen by EXPLAIN, this is because I've coded
ExecInitAlternativePlan() to return the root node only 1 of the 2 plans. If
I had kept the AlternativePlan node around then the EXPLAIN output would
have 2 plans, both sitting under the AlternativePlan node.

I've attached a rebased patch which is based on master as of today.

Any comments/reviews are welcome.

Regards

David Rowley

Attachment Content-Type Size
inner_join_removals_2014-03-16_6c2f36d.patch application/octet-stream 112.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2015-03-16 10:15:50 Re: Removing INNER JOINs
Previous Message Dmitry Voronin 2015-03-16 08:31:29 Question about TEMP tables