Re: Custom Scan APIs (Re: Custom Plan node)

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Custom Scan APIs (Re: Custom Plan node)
Date: 2014-02-25 04:28:00
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8F7EA31@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > /* Hook for plugins to add custom join path, in addition to default
> > ones */ typedef void (*add_join_path_hook_type)(PlannerInfo *root,
> > RelOptInfo *joinrel,
> > RelOptInfo *outerrel,
> > RelOptInfo *innerrel,
> > JoinType jointype,
> > SpecialJoinInfo *sjinfo,
> > List *restrictlist,
> > List *mergeclause_list,
> > SemiAntiJoinFactors *semifactors,
> > Relids param_source_rels,
> > Relids extra_lateral_rels);
> > extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
> >
> > Likely, its arguments upper than restrictlist should be informed to
> > extensions, because these are also arguments of add_paths_to_joinrel().
> > However, I'm not 100% certain how about other arguments should be informed.
> > Probably, it makes sense to inform param_source_rels and
> > extra_lateral_rels to check whether the path is sensible for parameterized
> paths.
> > On the other hand, I doubt whether mergeclause_list is usuful to deliver.
> > (It may make sense if someone tries to implement their own merge-join
> > implementation??)
> >
> > I'd like to seem idea to improve the current interface specification.
>
> I've read the code path to add custom join again, and felt that providing
> semifactors seems not necessary for the first cut, because it is used in
> only initial_cost_nestloop (final_cost_nestloop receives semifactors but
> it is not used there), and external module would not become so smart before
> 9.5 development cycle. It seems enough complex to postpone determinig
> whether it's essential for add_join_path_hook.
> Do you have any concrete use case for the parameter?
>
The reason why I asked the question above is, I haven't been 100% certain
about its usage. Indeed, semifactors is applied on a limited usage, but
quite easy to reproduce by extension later (using clauselist_selectivity)
if extension wants this factor. So, I agree with removing the semifactors
here.

> mergeclause_list and param_source_rels seem little easier to use, but
> anyway it should be documented how to use those parameters.
>
The mergeclause_list might not be sufficient for extensions to determine
whether its own mergejoin is applicable here. See the comment below; that
is on the head of select_mergejoin_clauses.

| * *mergejoin_allowed is normally set to TRUE, but it is set to FALSE if
| * this is a right/full join and there are nonmergejoinable join clauses.
| * The executor's mergejoin machinery cannot handle such cases, so we have
| * to avoid generating a mergejoin plan. (Note that this flag does NOT
| * consider whether there are actually any mergejoinable clauses. This is
| * correct because in some cases we need to build a clauseless mergejoin.
| * Simply returning NIL is therefore not enough to distinguish safe from
| * unsafe cases.)
|
It says, mergejoin_clause == NIL is not a sufficient check to determine
whether the mergejoin logic is applicable on the target join.
So, either of them is probably an option for extension that tries to implement
their own mergejoin logic; (1) putting both of mergejoin_allowed and
mergeclause_list as arguments of the hook, or (2) re-definition of
select_mergejoin_clauses() as extern function to reproduce the variables on
demand. Which one is more preferable?

BTW, I found a duplicate clause_sides_match_join() definition, that is
invoked at select_mergejoin_clauses(), in joinpath.c and analyzejoins.c.
Either of them should be eliminated, I think.

For param_source_rels and extra_lateral_rels, I'll put source code comments
around add_join_path_hook.
Earlier half of try_(nestloop|hashjoin|mergejoin)_path is probably useful
as example of extension implementation.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

> -----Original Message-----
> From: Shigeru Hanada [mailto:shigeru(dot)hanada(at)gmail(dot)com]
> Sent: Tuesday, February 25, 2014 12:41 AM
> To: Kohei KaiGai
> Cc: Kaigai, Kouhei(海外, 浩平); Stephen Frost; Jim Mlodgenski; Robert Haas;
> Tom Lane; PgHacker; Peter Eisentraut
> Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
>
> Hi Kaigai-san,
>
> Sorry to leave the thread for a while.
>
> 2014-02-23 22:24 GMT+09:00 Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>:
> > (1) Interface to add alternative paths in addition to built-in join
> > paths
> >
> > This patch adds "add_join_path_hook" on add_paths_to_joinrel to allow
> > extensions to provide alternative scan path in addition to the
> > built-in join paths. Custom-scan path being added is assumed to
> > perform to scan on a (virtual) relation that is a result set of joining
> relations.
> > My concern is its arguments to be pushed. This hook is declared as follows:
> >
> > /* Hook for plugins to add custom join path, in addition to default
> > ones */ typedef void (*add_join_path_hook_type)(PlannerInfo *root,
> > RelOptInfo *joinrel,
> > RelOptInfo *outerrel,
> > RelOptInfo *innerrel,
> > JoinType jointype,
> > SpecialJoinInfo *sjinfo,
> > List *restrictlist,
> > List *mergeclause_list,
> > SemiAntiJoinFactors
> *semifactors,
> > Relids param_source_rels,
> > Relids extra_lateral_rels);
> > extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
> >
> > Likely, its arguments upper than restrictlist should be informed to
> > extensions, because these are also arguments of add_paths_to_joinrel().
> > However, I'm not 100% certain how about other arguments should be informed.
> > Probably, it makes sense to inform param_source_rels and
> > extra_lateral_rels to check whether the path is sensible for parameterized
> paths.
> > On the other hand, I doubt whether mergeclause_list is usuful to deliver.
> > (It may make sense if someone tries to implement their own merge-join
> > implementation??)
> >
> > I'd like to seem idea to improve the current interface specification.
>
> I've read the code path to add custom join again, and felt that providing
> semifactors seems not necessary for the first cut, because it is used in
> only initial_cost_nestloop (final_cost_nestloop receives semifactors but
> it is not used there), and external module would not become so smart before
> 9.5 development cycle. It seems enough complex to postpone determinig
> whether it's essential for add_join_path_hook.
> Do you have any concrete use case for the parameter?
>
> mergeclause_list and param_source_rels seem little easier to use, but
> anyway it should be documented how to use those parameters.
>
> IMHO, minimal interface seems better than fully-fledged but half-baked one,
> especially in the first-cut.
>
> --
> Shigeru HANADA

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2014-02-25 04:37:45 Re: Dump pageinspect to 1.2: page_header with pg_lsn datatype
Previous Message Rajeev rastogi 2014-02-25 04:03:14 Re: Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire