Re: Removing INNER JOINs

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Mart Kelder <mart(at)kelder31(dot)nl>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Removing INNER JOINs
Date: 2014-12-10 10:04:39
Message-ID: CAApHDvpk8M6_PMH_rzJm6i2AiPZQz5emMZ-c8vfhrCxOH334Xw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4 December 2014 at 06:08, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>
> Far better would be what I mentioned upthread: an explicit switch node
> in the plan tree, analogous to the existing AlternativeSubPlan structure.
>
> ChooseJoinSubPlan
> -> plan tree requiring all tables to be joined
> -> plan tree not requiring all tables to be joined
>
> This allows sensible display by EXPLAIN and avoids the need for the core
> executor code to be dirtied with implementation of the precise switch
> rule: all that logic goes into the ChooseJoinSubPlan plan node code.
>
>
I'm not sure if I 100% understand what you mean by sensible EXPLAIN output?

Would you have both complete plans under a ChooseJoinSubPlan? If so I don't
yet understand why that is needed. Plain old EXPLAIN (without analyze)
performs an executor init, which I guess must have obtained a snapshot
somewhere along the line. Couldn't the plan just be selected at init time?
and just have the ChooseJoinSubPlan, or whatever select the correct plan?

I've attached a version which does this, and I think the EXPLAIN is quite
sensible... It shows the join removed plan when joins are removed, and
shows the "All Purpose" plan when there's FK triggers pending.

The patch has a few fixme's in it, but I think it's enough to demonstrate
how I think it could work.

The bulk of my changes are in allpaths.c, planmain.c and planner.c. The
critical change is query_planner() now returns a List instead of
a RelOptInfo. I wasn't quite sure how else to handle this. Please also
notice the change to make_one_rel(). This function is now called twice if
remove_useless_joins() found 1 or more INNER JOINs to be possibly
removable. in remove_useless_joins() the rels are not marked as
RELOPT_DEADREL like they are with LEFT JOINs, they remain as
RELOPT_BASEREL, only they have the skipFlags to mark that they can be
removed when there's no FK triggers pending. A flag on PlannerGlobal is
also set which will later force make_one_rel() to be called twice. Once for
the join removal plan, and once for the "All Purpose" plan. query_planner()
then returns a list of the RelOptInfos of those 2 final rels created by
make_one_rel(). All the processing that previously got done on that final
rel now gets done on the list of final rels. If there's more than 1 in that
list then I'm making the root node of the plan an "AlternativePlan" node.
On init of this node during execution time there is some logic which
chooses which plan to execute. The code here just calls ExecInitNode() on
the root node of the selected plan and returns that, thus skipping over the
AlternativePlan node, so that it can't be seen by EXPLAIN or EXPLAIN
ANALYZE.

The patch has grown quite a bit, but most of the extra size was generated
from having to indent a whole bunch code in grouping_planner() by 1 more
tab.

All regression tests pass and we're back to getting the most efficient
plans again. i.e no extra redundant sort node on merge joins! :-)

I'm keen to know if the attached patch is a viable option.

Oh, and if anyone is wondering why I made skipFlags a bit mask and not just
a bool. I think it would also be possible to expand LEFT JOIN removal at
some later date to allow deferrable unique constraints to remove LEFT
JOINs. This is currently not possible due to the planner not knowing if
there will be unique violations when the plan is executed. Quite possibly
this could be handled in a similar way to how the INNER JOINs work in the
attached.

Also, apologies for the late response on this. I've been busy the most
evenings this week so far. I was quite happy when I woke up in the morning
to see all the responses about this. So thank you everyone for generating
some great ideas. Hopefully I'll get this one nailed down soon.

Regards

David Rowley

> I would envision the planner starting out generating the first subplan
> (without the optimization), but as it goes along, noting whether there
> are any opportunities for join removal. At the end, if it found that
> there were such opportunities, re-plan assuming that removal is possible.
> Then stick a switch node on top.
>
> This would give optimal plans for both cases, and it would avoid the need
> for lots of extra planner cycles when the optimization can't be applied
> ... except for one small detail, which is that the planner has a bad habit
> of scribbling on its own input. I'm not sure how much cleanup work would
> be needed before that "re-plan" operation could happen as easily as is
> suggested above. But in principle this could be made to work.
>
> regards, tom lane
>

Attachment Content-Type Size
inner_join_removals_2014-12-10_7dc6756.patch application/octet-stream 107.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-12-10 12:26:18 Re: GSSAPI, SSPI - include_realm default
Previous Message Bruce Momjian 2014-12-10 09:58:21 Re: Small TRUNCATE glitch