Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

From: Shigeru HANADA <shigeru(dot)hanada(at)gmail(dot)com>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Date: 2015-04-21 02:47:35
Message-ID: FE52C715-1481-4B24-9783-290255C7D3A1@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kaigai-san,

I reviewed the Custom/Foreign join API patch again after writing a patch of join push-down support for postgres_fdw.

2015/03/26 10:51、Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> のメール:

>>> Or bottom of make_join_rel(). IMO build_join_rel() is responsible for just
>> building (or searching from a list) a RelOptInfo for given relids. After that
>> make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per join
>> type to generate actual Paths implements the join. make_join_rel() is called
>> only once for particular relid combination, and there SpecialJoinInfo and
>> restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising
>> for FDW cases.
>>>
>>> I like that idea, but I think we will have complex hook signature, it won't
>> remain as simple as hook (root, joinrel).
>>
>> Signature of the hook (or the FDW API handler) would be like this:
>>
>> typedef void (*GetForeignJoinPaths_function ) (PlannerInfo *root,
>> RelOptInfo *joinrel,
>> RelOptInfo *outerrel,
>> RelOptInfo *innerrel,
>> JoinType jointype,
>> SpecialJoinInfo *sjinfo,
>> List *restrictlist);
>>
>> This is very similar to add_paths_to_joinrel(), but lacks semifactors and
>> extra_lateral_rels. semifactors can be obtained with
>> compute_semi_anti_join_factors(), and extra_lateral_rels can be constructed
>> from root->placeholder_list as add_paths_to_joinrel() does.
>>
>> From the viewpoint of postgres_fdw, jointype and restrictlist is necessary to
>> generate SELECT statement, so it would require most work done in make_join_rel
>> again if the signature was hook(root, joinrel). sjinfo will be necessary for
>> supporting SEMI/ANTI joins, but currently it is not in the scope of postgres_fdw.
>>
>> I guess that other FDWs require at least jointype and restrictlist.
>>
> The attached patch adds GetForeignJoinPaths call on make_join_rel() only when
> 'joinrel' is actually built and both of child relations are managed by same
> FDW driver, prior to any other built-in join paths.
> I adjusted the hook definition a little bit, because jointype can be reproduced
> using SpecialJoinInfo. Right?

Yes, it can be derived from the expression below:
jointype = sjinfo ? sjinfo->jointype : JOIN_INNER;

>
> Probably, it will solve the original concern towards multiple calls of FDW
> handler in case when it tries to replace an entire join subtree with a foreign-
> scan on the result of remote join query.
>
> How about your opinion?

AFAIS it’s well-balanced about calling count and available information.

New FDW API GetForeignJoinPaths is called only once for a particular combination of join, such as (A join B join C). Before considering all joins in a join level (number of relations contained in the join tree), possible join combinations of lower join level are considered recursively. As Tom pointed out before, say imagine a case like ((huge JOIN large) LEFT JOIN small), expensive path in lower join level might be

Here, please let me summarize the changes in the patch as the result of my review.

* Add set_join_pathlist_hook_type in add_paths_to_joinrel
This hook is intended to provide a chance to add one or more CustomPaths for an actual join combination. If the join is reversible, the hook is called for both A * B and B * A. This is different from FDW API but it seems fine because FDWs should have chances to process the join in more abstract level than CSPs.

Parameters are same as hash_inner_and_outer, so they would be enough for hash-like or nestloop-like methods. I’m not sure whether mergeclause_list is necessary as a parameter or not. It’s information for merge join which is generated when enable_mergejoin is on and the join is not FULL OUTER. Does some CSP need it for processing a join in its own way? Then it must be in parameter list because select_mergejoin_clauses is static so it’s not accessible from external modules.

The timing of the hooking, after considering all built-in path types, seems fine because some of CSPs might want to use built-in paths as a template or a source.

One concern is in the document of the hook function. "Implementing Custom Paths” says:

> A custom scan provider will be also able to add paths by setting the following hook, to replace built-in join paths by custom-scan that performs as if a scan on preliminary joined relations, which us called after the core code has generated what it believes to be the complete and correct set of access paths for the join.

I think “replace” would mis-lead readers that CSP can remove or edit existing built-in paths listed in RelOptInfo#pathlist or linked from cheapest_foo. IIUC CSP can just add paths for the join relation, and planner choose it if it’s the cheapest.

* Add new FDW API GetForeignJoinPaths in make_join_rel
This FDW API is intended to provide a chance to add ForeignPaths for a join relation. This is called only once for a join relation, so FDW should consider reversed combination if it’s meaningful in their own mechanisms.

Note that this is called only when the join relation was *NOT* found in the PlannerInfo, to avoid redundant calls.

Parameters seems enough for postgres_fdw to process N-way join on remote side with pushing down join conditions and remote filters.

* Propagate FDW information through bottom-up planning
FDW can handle a join which uses foreign tables managed by the FDW, of course. We obtain FDW routine entry to plan a scan against a foreign table, so propagating the information up to join phase would help core planner to check the all sources are managed by one FDW or not. It also avoids repeated catalog accesses.

* Make create_plan_recurse non-static
This is for CSPs and FDWs which want underlying plan nodes of a join. For example, a CSP might want outer/inner plan nodes as input sources of a join.

* Treat scanrelid == 0 as pseudo scan
A foreign/custom join is represented by a scan against a pseudo relation, i.e. result of a join. Usually Scan has valid scanrelid, oid of a relation being scanned, and many functions assume that it’s always valid. The patch adds another code paths for scanrelid == 0 as custom/foreign join scans.

* Pseudo scan target list support
CustomScan and ForeignScan have csp_ps_tlist and fdw_ps_tlist respectively, for column reference tracking. A scan generated for custom/foreign join would have column from multiple relations in its target list, i.e. output columns. Ordinary scans have all valid columns of the relation as output, so references to them can be resolved easily, but we need an additional mechanism to determine where a reference in a target list of custom/foreign scan come from. This is very similar to what IndexOnlyScan does, so we reuse INDEX_VAR as mark of an indirect reference to another relation’s var.

For this mechanism, set_plan_refs is changed to fix Vars in ps_tlist of CustomScan and ForeignScan. For this change, new BitmapSet function bms_shift_members is added.

set_deparse_planstate is also changed to pass ps_tlist as namespace for deparsing.

These chanes seems reasonable, so I mark this patch as “ready for committers” to hear committers' thoughts.

Regards,
--
Shigeru HANADA
shigeru(dot)hanada(at)gmail(dot)com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-04-21 04:13:56 installcheck missing in src/bin/pg_rewind/Makefile
Previous Message Kyotaro HORIGUCHI 2015-04-21 01:07:03 Re: Optimization for updating foreign tables in Postgres FDW