Re: Custom Scan APIs (Re: Custom Plan node)

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
Cc: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Custom Scan APIs (Re: Custom Plan node)
Date: 2014-02-25 08:59:18
Message-ID: CAFjFpReqbkbte_1Xa2NuaxXnv9yGkqi+FhY3av7s+1wkfMzpBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 23, 2014 at 6:54 PM, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp> wrote:

> Folks,
>
> Let me remind the custom-scan patches; that is a basis feature of
> remote join of postgres_fdw, cache-only scan, (upcoming) GPU
> acceleration feature or various alternative ways to scan/join relations.
> Unfortunately, small amount of discussion we could have in this commit
> fest, even though Hanada-san volunteered to move the patches into
> "ready for committer" state at the CF-Nov.
>

Sorry for jumping into this late.
Instead of custom node, it might be better idea to improve FDW
infrastructure to push join. For the starters, is it possible for the
custom scan node hooks to create a ForeignScan node? In general, I think,
it might be better for the custom scan hooks to create existing nodes if
they serve the purpose.

>
> Prior to time-up, I'd like to ask hacker's opinion about its potential
> arguable points (from my standpoint) if it needs to be fixed up.
> One is hook definition to add alternative join path, and the other one
> is a special varno when a custom scan replace a join node.
> I'd like to see your opinion about them while we still have to change
> the design if needed.
>
> (1) Interface to add alternative paths in addition to built-in join paths
>
> This patch adds "add_join_path_hook" on add_paths_to_joinrel to allow
> extensions to provide alternative scan path in addition to the built-in
> join paths. Custom-scan path being added is assumed to perform to scan
> on a (virtual) relation that is a result set of joining relations.
> My concern is its arguments to be pushed. This hook is declared as follows:
>
> /* Hook for plugins to add custom join path, in addition to default ones */
> typedef void (*add_join_path_hook_type)(PlannerInfo *root,
> RelOptInfo *joinrel,
> RelOptInfo *outerrel,
> RelOptInfo *innerrel,
> JoinType jointype,
> SpecialJoinInfo *sjinfo,
> List *restrictlist,
> List *mergeclause_list,
> SemiAntiJoinFactors *semifactors,
> Relids param_source_rels,
> Relids extra_lateral_rels);
> extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
>
> Likely, its arguments upper than restrictlist should be informed to
> extensions,
> because these are also arguments of add_paths_to_joinrel().
> However, I'm not 100% certain how about other arguments should be informed.
> Probably, it makes sense to inform param_source_rels and extra_lateral_rels
> to check whether the path is sensible for parameterized paths.
> On the other hand, I doubt whether mergeclause_list is usuful to deliver.
> (It may make sense if someone tries to implement their own merge-join
> implementation??)
>
> I'd like to seem idea to improve the current interface specification.
>
>
Since a custom node is open implementation, it will be important to pass as
much information down to the hooks as possible; lest the hooks will be
constrained. Since the functions signatures within the planner, optimizer
will change from time to time, so the custom node hook signatures will need
to change from time to time. That might turn out to be maintenance overhead.

BTW, is it a good idea for custom nodes to also affect other paths like
append, group etc.? Will it need separate hooks for each of those?

>
> (2) CUSTOM_VAR for special Var reference
>
> @@ -134,6 +134,7 @@ typedef struct Expr
> #define INNER_VAR 65000 /* reference to inner subplan */
> #define OUTER_VAR 65001 /* reference to outer subplan */
> #define INDEX_VAR 65002 /* reference to index column */
> +#define CUSTOM_VAR 65003 /* reference to custom column */
>
> I newly added CUSTOM_VAR to handle a case when custom-scan override
> join relations.
> Var-nodes within join plan are adjusted to refer either ecxt_innertuple or
> ecxt_outertuple of ExprContext. It makes a trouble if custom-scan runs
> instead of built-in joins, because its tuples being fetched are usually
> stored on the ecxt_scantuple, thus Var-nodes also need to have right
> varno neither inner nor outer.
>
> SetPlanRefCustomScan callback, being kicked on set_plan_refs, allows
> extensions to rewrite Var-nodes within custom-scan node to indicate
> ecxt_scantuple using CUSTOM_VAR, instead of inner or outer.
> For example, a var-node with varno=CUSTOM_VAR and varattno=3 means
> this node reference the third attribute of the tuple in ecxt_scantuple.
> I think it is a reasonable solution, however, I'm not 100% certain
> whether people have more graceful idea to implement it.
>
> If you have comments around above two topic, or others, please give
> your ideas.
>
> Thanks,
>
> 2014-01-28 9:14 GMT+09:00 Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>:
> > Hi Stephen,
> >
> > Thanks for your comments.
> >
> >> * Kouhei Kaigai (kaigai(at)ak(dot)jp(dot)nec(dot)com) wrote:
> >> > Is somebody available to volunteer to review the custom-scan patch?
> >>
> >> I looked through it a bit and my first take away from it was that the
> patches
> >> to actually use the new hooks were also making more changes to the
> backend
> >> code, leaving me with the impression that the proposed interface isn't
> >> terribly stable. Perhaps those changes should have just been in the
> first
> >> patch, but they weren't and that certainly gave me pause.
> >>
> > Yes, the part-1 patch provides a set of interface portion to interact
> > between the backend code and extension code. Rest of part-2 and part-3
> > portions are contrib modules that implements its feature on top of
> > custom-scan API.
> >
> >> I'm also not entirely convinced that this is the direction to go in when
> >> it comes to pushing down joins to FDWs. While that's certainly a goal
> that
> >> I think we all share, this seems to be intending to add a completely
> different
> >> feature which happens to be able to be used for that. For FDWs,
> wouldn't
> >> we only present the FDW with the paths where the foreign tables for that
> >> FDW, or perhaps just a given foreign server, are being joined?
> >>
> > FDW's join pushing down is one of the valuable use-cases of this
> interface,
> > but not all. As you might know, my motivation is to implement GPU
> acceleration
> > feature on top of this interface, that offers alternative way to scan or
> join
> > relations or potentially sort or aggregate.
> > Probably, it is too stretch interpretation if we implement radix-sort on
> top
> > of FDW. I'd like you to understand the part-3 patch (FDW's join
> pushing-down)
> > is a demonstration of custom-scan interface for application, but not
> designed
> > for a special purpose.
> >
> > Right now, I put all the logic to interact CSI and FDW driver on
> postgres_fdw
> > side, it might be an idea to have common code (like a logic to check
> whether
> > the both relations to be joined belongs to same foreign server) on the
> backend
> > side as something like a gateway of them.
> >
> >
> > As an aside, what should be the scope of FDW interface?
> > In my understanding, it allows extension to implement "something" on
> behalf of
> > a particular data structure being declared with CREATE FOREIGN TABLE.
> > In other words, extension's responsibility is to generate a view of
> "something"
> > according to PostgreSQL' internal data structure, instead of the object
> itself.
> > On the other hands, custom-scan interface allows extensions to implement
> > alternative methods to scan or join particular relations, but it is not
> a role
> > to perform as a target being referenced in queries. In other words, it
> is methods
> > to access objects.
> > It is natural both features are similar because both of them intends
> extensions
> > to hook the planner and executor, however, its purpose is different.
> >
> > Thanks,
> > --
> > NEC OSS Promotion Center / PG-Strom Project
> > KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
>
>
>
> --
> KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2014-02-25 09:20:18 Re: Trigger information for auto_explain.
Previous Message Heikki Linnakangas 2014-02-25 08:54:31 Re: Dump pageinspect to 1.2: page_header with pg_lsn datatype