|From:||Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>|
|To:||Shigeru HANADA <shigeru(dot)hanada(at)gmail(dot)com>|
|Cc:||Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
> I reviewed the Custom/Foreign join API patch again after writing a patch of join
> push-down support for postgres_fdw.
Thanks for your dedicated jobs, my comments are inline below.
> Here, please let me summarize the changes in the patch as the result of my review.
> * Add set_join_pathlist_hook_type in add_paths_to_joinrel
> This hook is intended to provide a chance to add one or more CustomPaths for an
> actual join combination. If the join is reversible, the hook is called for both
> A * B and B * A. This is different from FDW API but it seems fine because FDWs
> should have chances to process the join in more abstract level than CSPs.
> Parameters are same as hash_inner_and_outer, so they would be enough for hash-like
> or nestloop-like methods. I’m not sure whether mergeclause_list is necessary
> as a parameter or not. It’s information for merge join which is generated when
> enable_mergejoin is on and the join is not FULL OUTER. Does some CSP need it
> for processing a join in its own way? Then it must be in parameter list because
> select_mergejoin_clauses is static so it’s not accessible from external modules.
I think, a preferable way is to reproduce the mergeclause_list by extension itself,
rather than pass it as a hook argument, because it is uncertain whether CSP should
follow "enable_mergejoin" parameter even if it implements a logic like merge-join.
Of course, it needs to expose select_mergejoin_clauses. It seems to me a straight-
> The timing of the hooking, after considering all built-in path types, seems fine
> because some of CSPs might want to use built-in paths as a template or a source.
> One concern is in the document of the hook function. "Implementing Custom Paths”
> > A custom scan provider will be also able to add paths by setting the following
> hook, to replace built-in join paths by custom-scan that performs as if a scan
> on preliminary joined relations, which us called after the core code has generated
> what it believes to be the complete and correct set of access paths for the join.
> I think “replace” would mis-lead readers that CSP can remove or edit existing
> built-in paths listed in RelOptInfo#pathlist or linked from cheapest_foo. IIUC
> CSP can just add paths for the join relation, and planner choose it if it’s the
I adjusted the documentation stuff as follows:
A custom scan provider will be also able to add paths by setting the
following hook, to add <literal>CustomPath</> nodes that perform as
if built-in join logic doing. It is typically expected to take two
input relations then generate a joined output stream, or just scans
preliminaty joined relations like materialized-view. This hook is
called next to the consideration of core join logics, then planner
will choose the best path to run the relations join in the built-in
and custom ones.
Probably, it can introduce what this hook works correctly.
v12 patch updated only this portion.
> * Add new FDW API GetForeignJoinPaths in make_join_rel
> This FDW API is intended to provide a chance to add ForeignPaths for a join relation.
> This is called only once for a join relation, so FDW should consider reversed
> combination if it’s meaningful in their own mechanisms.
> Note that this is called only when the join relation was *NOT* found in the
> PlannerInfo, to avoid redundant calls.
Yep, it is designed according to the discussion upthreads.
It can produce N-way remote join paths even if intermediate join relation is
more expensive than local join + two foreign scan.
> Parameters seems enough for postgres_fdw to process N-way join on remote side
> with pushing down join conditions and remote filters.
You ensured it clearly.
> * Treat scanrelid == 0 as pseudo scan
> A foreign/custom join is represented by a scan against a pseudo relation, i.e.
> result of a join. Usually Scan has valid scanrelid, oid of a relation being
> scanned, and many functions assume that it’s always valid. The patch adds another
> code paths for scanrelid == 0 as custom/foreign join scans.
> * Pseudo scan target list support
> CustomScan and ForeignScan have csp_ps_tlist and fdw_ps_tlist respectively, for
> column reference tracking. A scan generated for custom/foreign join would have
> column from multiple relations in its target list, i.e. output columns. Ordinary
> scans have all valid columns of the relation as output, so references to them
> can be resolved easily, but we need an additional mechanism to determine where
> a reference in a target list of custom/foreign scan come from. This is very
> similar to what IndexOnlyScan does, so we reuse INDEX_VAR as mark of an indirect
> reference to another relation’s var.
Right, FDW/CSP driver is responsible to set *_ps_tlist to inform the core planner
which columns of relations are referenced, and which attribute represents what
columns/relations. It is an interface contract when foreign/custom-scan is chosen
instead of the built-in join logic.
> For this mechanism, set_plan_refs is changed to fix Vars in ps_tlist of CustomScan
> and ForeignScan. For this change, new BitmapSet function bms_shift_members is
> set_deparse_planstate is also changed to pass ps_tlist as namespace for
Yep, it is same as IndexOnlyScan.
> These chanes seems reasonable, so I mark this patch as “ready for committers”
> to hear committers' thoughts.
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
|Next Messagefirstname.lastname@example.org||2015-04-22 05:32:55||Authenticating from SSL certificates|
|Previous Message||Jan de Visser||2015-04-22 02:23:23||Re: Idea: closing the loop for "pg_ctl reload"|