Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Shigeru HANADA <shigeru(dot)hanada(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Date: 2015-04-22 02:33:50
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8010D40FD@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hanada-san,

> I reviewed the Custom/Foreign join API patch again after writing a patch of join
> push-down support for postgres_fdw.
>
Thanks for your dedicated jobs, my comments are inline below.

> Here, please let me summarize the changes in the patch as the result of my review.
>
> * Add set_join_pathlist_hook_type in add_paths_to_joinrel
> This hook is intended to provide a chance to add one or more CustomPaths for an
> actual join combination. If the join is reversible, the hook is called for both
> A * B and B * A. This is different from FDW API but it seems fine because FDWs
> should have chances to process the join in more abstract level than CSPs.
>
> Parameters are same as hash_inner_and_outer, so they would be enough for hash-like
> or nestloop-like methods. I’m not sure whether mergeclause_list is necessary
> as a parameter or not. It’s information for merge join which is generated when
> enable_mergejoin is on and the join is not FULL OUTER. Does some CSP need it
> for processing a join in its own way? Then it must be in parameter list because
> select_mergejoin_clauses is static so it’s not accessible from external modules.
>
I think, a preferable way is to reproduce the mergeclause_list by extension itself,
rather than pass it as a hook argument, because it is uncertain whether CSP should
follow "enable_mergejoin" parameter even if it implements a logic like merge-join.
Of course, it needs to expose select_mergejoin_clauses. It seems to me a straight-
forward way.

> The timing of the hooking, after considering all built-in path types, seems fine
> because some of CSPs might want to use built-in paths as a template or a source.
>
> One concern is in the document of the hook function. "Implementing Custom Paths”
> says:
>
> > A custom scan provider will be also able to add paths by setting the following
> hook, to replace built-in join paths by custom-scan that performs as if a scan
> on preliminary joined relations, which us called after the core code has generated
> what it believes to be the complete and correct set of access paths for the join.
>
> I think “replace” would mis-lead readers that CSP can remove or edit existing
> built-in paths listed in RelOptInfo#pathlist or linked from cheapest_foo. IIUC
> CSP can just add paths for the join relation, and planner choose it if it’s the
> cheapest.
>
I adjusted the documentation stuff as follows:

A custom scan provider will be also able to add paths by setting the
following hook, to add <literal>CustomPath</> nodes that perform as
if built-in join logic doing. It is typically expected to take two
input relations then generate a joined output stream, or just scans
preliminaty joined relations like materialized-view. This hook is
called next to the consideration of core join logics, then planner
will choose the best path to run the relations join in the built-in
and custom ones.

Probably, it can introduce what this hook works correctly.
v12 patch updated only this portion.

> * Add new FDW API GetForeignJoinPaths in make_join_rel
> This FDW API is intended to provide a chance to add ForeignPaths for a join relation.
> This is called only once for a join relation, so FDW should consider reversed
> combination if it’s meaningful in their own mechanisms.
>
> Note that this is called only when the join relation was *NOT* found in the
> PlannerInfo, to avoid redundant calls.
>
Yep, it is designed according to the discussion upthreads.
It can produce N-way remote join paths even if intermediate join relation is
more expensive than local join + two foreign scan.

> Parameters seems enough for postgres_fdw to process N-way join on remote side
> with pushing down join conditions and remote filters.
>
You ensured it clearly.

> * Treat scanrelid == 0 as pseudo scan
> A foreign/custom join is represented by a scan against a pseudo relation, i.e.
> result of a join. Usually Scan has valid scanrelid, oid of a relation being
> scanned, and many functions assume that it’s always valid. The patch adds another
> code paths for scanrelid == 0 as custom/foreign join scans.
>
Right,

> * Pseudo scan target list support
> CustomScan and ForeignScan have csp_ps_tlist and fdw_ps_tlist respectively, for
> column reference tracking. A scan generated for custom/foreign join would have
> column from multiple relations in its target list, i.e. output columns. Ordinary
> scans have all valid columns of the relation as output, so references to them
> can be resolved easily, but we need an additional mechanism to determine where
> a reference in a target list of custom/foreign scan come from. This is very
> similar to what IndexOnlyScan does, so we reuse INDEX_VAR as mark of an indirect
> reference to another relation’s var.
>
Right, FDW/CSP driver is responsible to set *_ps_tlist to inform the core planner
which columns of relations are referenced, and which attribute represents what
columns/relations. It is an interface contract when foreign/custom-scan is chosen
instead of the built-in join logic.

> For this mechanism, set_plan_refs is changed to fix Vars in ps_tlist of CustomScan
> and ForeignScan. For this change, new BitmapSet function bms_shift_members is
> added.
>
> set_deparse_planstate is also changed to pass ps_tlist as namespace for
> deparsing.
>
Yep, it is same as IndexOnlyScan.

> These chanes seems reasonable, so I mark this patch as “ready for committers”
> to hear committers' thoughts.
>
Thanks!
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

Attachment Content-Type Size
pgsql-v9.5-custom-join.v12.patch application/octet-stream 41.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message keenan@thebrocks.net 2015-04-22 05:32:55 Authenticating from SSL certificates
Previous Message Jan de Visser 2015-04-22 02:23:23 Re: Idea: closing the loop for "pg_ctl reload"