Re: Foreign join pushdown vs EvalPlanQual

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, 花田茂 <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: Foreign join pushdown vs EvalPlanQual
Date: 2015-07-02 14:13:36
Message-ID: 9A28C8860F777E439AA12E8AEA7694F80110FA61@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > Let me introduce a few cases we should pay attention.
> >
> > Foreign/CustomScan node may stack; that means a Foreign/CustomScan node
> > may have child node that includes another Foreign/CustomScan node with
> > scanrelid==0.
> > (At this moment, ForeignScan cannot have child node, however, more
> > aggressive push-down [1] will need same feature to fetch tuples from
> > local relation and construct VALUES() clause.)
> > In this case, the highest Foreign/CustomScan node (that is also nearest
> > to LockRows or ModifyTuples) run the alternative sub-plan that includes
> > scan/join plans dominated by fdw_relids or custom_relids.
> >
> > For example:
> > LockRows
> > -> HashJoin
> > -> CustomScan (AliceJoin)
> > -> SeqScan on t1
> > -> CustomScan (CarolJoin)
> > -> SeqScan on t2
> > -> SeqScan on t3
> > -> Hash
> > -> CustomScan (BobJoin)
> > -> SeqScan on t4
> > -> ForeignScan (remote join involves ft5, ft6)
> >
> > In this case, AliceJoin will have alternative sub-plan to join t1, t2
> > and t3, then it shall be used on EvalPlanQual(). Also, BobJoin will
> > have alternative sub-plan to join t4, ft5 and ft6. CarolJoin and the
> > ForeignScan will also have alternative sub-plan, however, these are
> > not used in this case.
> > Probably, it works fine.
>
> Yeah, I think so too.
>
Sorry, I need to adjust my explanation above a bit:

In this case, AliceJoin will have alternative sub-plan to join t1 and
CarolJoin, then CarolJoin will have alternative sub-plan to join t2 and
t3. Also, BobJoin will have alternative sub-plan to join t4 and the
ForeignScan with remote join, and this ForeignScan node will have
alternative sub-plan to join ft5 and ft6.

Why this recursive design is better? Because it makes planner enhancement
much simple than overall approach. Please see my explanation in the
section below.

> > On the next step, how do we implement this design?
> > I guess that planner needs to keep a path that contains neither
> > foreign-join nor custom-join with scanrelid==0.
> > Probably, "cheapest_builtin_path" of RelOptInfo is needed that
> > never contains these remote/custom join logic, as a seed of
> > alternative sub-plan.
>
> Yeah, I think so too, but I've not fugiured out how to implement this yet.
>
> To be honest, ISTM that it's difficult to do that simply and efficiently
> for the foreign/custom-join-pushdown API that we have for 9.5. It's a
> little late, but what I started thinking is to redesign that API so that
> that API is called at standard_join_search, as discussed in [2]; (1) to
> place that API call *after* the set_cheapest call and (2) to place
> another set_cheapest call after that API call for each joinrel. By the
> first set_cheapest call, I think we could probably save an alternative
> path that we need in "cheapest_builtin_path". By the second
> set_cheapest call following that API call, we could consider
> foreign/custom-join-pushdown paths also. What do you think about this idea?
>
Disadvantage is larger than advantage, sorry.
The reason why we put foreign/custom-join hook on add_paths_to_joinrel()
is that the source relations (inner/outer) were not obvious, thus,
we cannot reproduce which relations are the source of this join.
So, I had to throw a spoon when I tried this approach before.

My idea is that we save the cheapest_total_path of RelOptInfo onto the
new cheapest_builtin_path just before the GetForeignJoinPaths() hook.
Why? It should be a built-in join logic, never be a foreign/custom-join,
because of the hook location; only built-in logic shall be added here.
Even if either/both of join sub-trees contains foreign/custom-join,
these path have own alternative sub-plan at their level, no need to
care about at current level. (It is the reason why I adjust my explanation
above.)
Once this built-in path is kept and foreign/custom-join get chosen by
set_cheapest(), it is easy to attach this sub-plan to ForeignScan or
CustomScan node.
I don't find any significant down-side in this approach.
How about your opinion?

Regarding to the development timeline, I prefer to put something
workaround not to kick Assert() on ExecScanFetch().
We may add a warning in the documentation not to replace built-in
join if either/both of sub-trees are target of UPDATE/DELETE or
FOR SHARE/UPDATE.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2015-07-02 14:14:09 Re: Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file
Previous Message Amit Kapila 2015-07-02 14:08:42 Re: drop/truncate table sucks for large values of shared buffers