Re: Custom Scan APIs (Re: Custom Plan node)

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, Stephen Frost <sfrost(at)snowman(dot)net>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Custom Scan APIs (Re: Custom Plan node)
Date: 2014-03-12 04:54:54
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


The attached two patches are the revised custom-plan interface
and example usage that implements existing MergeJoin on top of
this interface.

According to the discussion last week, I revised the portion
where custom-node is expected to perform a particular kind of
task, like scanning a relation, by putting polymorphism with
a set of callbacks set by custom-plan provider.
So, the core backend can handle this custom-plan node just
an abstracted plan-node with no anticipation.
Even though the subject of this message says "custom-scan",
I'd like to name the interface "custom-plan" instead, because
it became fully arbitrary of extension whether it scan on
a particular relation.

Definition of CustomXXXX data types were simplified:

typedef struct CustomPath
Path path;
const struct CustomPathMethods *methods;
} CustomPath;

typedef struct CustomPlan
Plan plan;
const struct CustomPlanMethods *methods;
} CustomPlan;

typedef struct CustomPlanState
PlanState ps;
const CustomPlanMethods *methods;
} CustomPlanState;

Each types have a base class and a set of function pointers that
characterize the behavior of this custom-plan node.
In usual use-cases, extension is expected to extend these classes
to keep their private data fields needed to implement its own

Most of the methods are designed to work as a thin layer towards
existing planner / executor functions, so custom-plan provides
has to be responsible to implement its method to communicate with
core backend as built-in ones doing.

Regarding to the topic we discussed last week,

* CUSTOM_VAR has gone.
The reason why CUSTOM_VAR was needed is, we have to handle EXPLAIN
command output (including column names being referenced) even if
a custom-plan node replaced a join but has no underlying subplans
on left/right subtrees.
A typical situation like this is a remote-join implementation that
I tried to extend postgres_fdw on top of the previous interface.
It retrieves a flat result set of the remote join execution, thus
has no subplan locally. On the other hand, EXPLAIN tries to find
out "actual" Var node from the underlying subplan if a Var node
has special varno (INNER/OUTER/INDEX).
I put a special method to solve the problem. GetSpecialCustomVar
method is called if a certain Var node of custom-plan has a special
varno, then custom-plan provider can inform the core backend
an expression node to be referenced by this Var node.
It allows to solve the column name without recursive walking on
the subtrees, so it enables a custom-plan node that replaces
a part of plan-tree.
This method is optional, so available to adopt existing way if
custom-plan provider does not do anything special.

* Functions to be exposed, from static declaration

Right now, static functions are randomly exposed on demand.
So, we need more investigation which functions are needed, and
which others are not.
According to my trial, the part-2 patch that is MergeJoin on top
of the custom-plan interface, class of functions that recursively
walk on subplan tree have to be exposed. Like, ExplainPreScanNode,
create_plan_recurse, set_plan_refs, fix_expr_common or finalize_plan.
In case when custom-plan performs like built-in Append node, it
keeps a list of sub-plans in its private field, so the core backend
cannot know existence of sub-plans, thus its unavailable to make
subplan, unavailable to output EXPLAIN and so on.
It does not make sense to reworking on the extension side again.
Also, createplan.c has many useful functions to construct plan-node,
however, most of them are static because all the built-in plan-node
are constructed by the routines in this file, we didn't need to
expose them to others. I think, functions in createplan.c being
called by create_xxxx_plan() functions to construct plan-node should
be exposed for extension's convenient.

* Definition of add_join_path_hook

I didn't have idea to improve the definition and location of this
hook, so it is still on the tail of the add_paths_to_joinrel().
Its definition was a bit adjusted according to the feedback on
the pgsql-hackers. I omitted the "mergeclause_list" and " semifactors"
from the argument list. Indeed, these are specific to the built-in
MergeJoin logic and easy to reproduce.

* Hook location of add_scan_path_hook

I moved the add_scan_path_hook and set_cheapest() into
set_base_rel_pathlists() from various caller locations;
set_xxxx_pathlist() functions typically.
It enabled to consolidate the location to add custom-path for base

* CustomMergeJoin as a proof-of-concept

The contrib module in the part-2 portion is, a merge-join implementation
on top of custom-plan interface, even though 99% of its implementation is
identical with built-in ones.
Its purpose is to demonstrate a custom join logic can be implemented using
custom-plan interface, even if custom-plan node has underlying sub-plans
unlike previous my examples.

NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sent: Friday, March 07, 2014 3:09 AM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; Robert
> Haas; PgHacker; Peter Eisentraut
> Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
> Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> writes:
> > I expected to include simple function pointers for copying and
> > text-output as follows:
> > typedef struct {
> > Plan plan;
> > :
> > NodeCopy_function node_copy;
> > NodeTextOut_function node_textout;
> > } Custom;
> I was thinking more like
> typedef struct CustomPathFuncs {
> const char *name; /* used for debugging purposes only */
> NodeCopy_function node_copy;
> NodeTextOut_function node_textout;
> ... etc etc etc ...
> } CustomPathFuncs;
> typedef struct CustomPath {
> Path path;
> const CustomPathFuncs *funcs;
> ... maybe a few more fields here, but not too darn many ...
> } CustomPath;
> and similarly for CustomPlan.
> The advantage of this way is it's very cheap for (what I expect will be)
> the common case where an extension has a fixed set of support functions
> for its custom paths and plans. It just declares a static constant
> CustomPathFuncs struct, and puts a pointer to that into its paths.
> If an extension really needs to set the support functions on a per-object
> basis, it can do this:
> typdef struct MyCustomPath {
> CustomPath cpath;
> CustomPathFuncs funcs;
> ... more fields ...
> } MyCustomPath;
> and then initialization of a MyCustomPath would include
> mypath->cpath.funcs = &mypath->funcs;
> mypath->funcs.node_copy = MyCustomPathCopy;
> ... etc etc ...
> In this case we're arguably wasting one pointer worth of space in the path,
> but considering the number of function pointers such a path will be carrying,
> I don't think that's much of an objection.
> >> So? If you did that, then you wouldn't have renumbered the Vars as
> >> INNER/OUTER. I don't believe that CUSTOM_VAR is necessary at all; if
> >> it is needed, then there would also be a need for an additional tuple
> >> slot in executor contexts, which you haven't provided.
> > For example, the enhanced postgres_fdw fetches the result set of
> > remote join query, thus a tuple contains the fields come from both side.
> > In this case, what varno shall be suitable to put?
> Not sure what we'd do for the general case, but CUSTOM_VAR isn't the solution.
> Consider for example a join where both tables supply columns named "id"
> --- if you put them both in one tupledesc then there's no non-kluge way
> to identify them.
> Possibly the route to a solution involves adding another plan-node callback
> function that ruleutils.c would use for printing Vars in custom join nodes.
> Or maybe we could let the Vars keep their original RTE numbers, though that
> would complicate life at execution time.
> Anyway, if we're going to punt on add_join_path_hook for the time being,
> this problem can probably be left to solve later. It won't arise for simple
> table-scan cases, nor for single-input plan nodes such as sorts.
> regards, tom lane

Attachment Content-Type Size
pgsql-v9.4-custom-scan.part-2.v10.patch application/octet-stream 147.1 KB
pgsql-v9.4-custom-scan.part-1.v10.patch application/octet-stream 65.3 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message David Johnston 2014-03-12 06:10:59 Re: COPY table FROM STDIN doesn't show count tag
Previous Message David Johnston 2014-03-12 04:39:25 Re: db_user_namespace a "temporary measure"