Re: Custom Scan APIs (Re: Custom Plan node)

From: Jim Mlodgenski <jimmy76(at)gmail(dot)com>
To: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: Custom Scan APIs (Re: Custom Plan node)
Date: 2013-11-18 23:47:10
Message-ID: CAB_5SRcEVvXiQxoxmFO--ujPeMyS6zfQsrJrkXTPOEGECv85NA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 18, 2013 at 7:25 AM, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp> wrote:

> The attached patches are the revised custom-scan APIs.
>

My initial review on this feature:
- The patches apply and build, but it produces a warning:
ctidscan.c: In function ‘CTidInitCustomScanPlan’:
ctidscan.c:362:9: warning: unused variable ‘scan_relid’ [-Wunused-variable]

I'd recommend that you split the part1 patch containing the ctidscan
contrib into its own patch. It is more than half of the patch and its
certainly stands on its own. IMO, I think ctidscan fits a very specific use
case and would be better off being an extension instead of in contrib.

> - Custom-scan.sgml was added to introduce the way to write custom-scan
> provider in the official documentation.
> - Much code duplication in postgres_fdw.c was eliminated. I split some fdw-
> handlers into two parts; common portion and fdw specific one.
> Executor callbacks of custom-scan code utilizes the common portion above
> because most of its implementations are equivalent.
>
> I'd like to see comments regarding to the way to handle Var reference onto
> a custom-scan that replaced relations join.
> A varno of Var that references a join relation is rtindex of either
> right or left
> relation, then setrefs.c adjust it well; INNER_VAR or OUTER_VAR shall be
> set instead.
> However, it does not work well if a custom-scan that just references result
> of remote join query was chosen instead of local join, because its result
> shall be usually set in the ps_ResultTupleSlot of PlanState, thus
> ExecEvalScalarVar does not reference neither inner nor outer slot.
> Instead of existing solution, I added one more special varno; CUSTOM_VARNO
> that just references result-tuple-slot of the target relation.
> If CUSTOM_VARNO is given, EXPLAIN(verbose) generates column name from
> the TupleDesc of underlying ps_ResultTupleSlot.
> I'm not 100% certain whether it is the best approach for us, but it works
> well.
>
> Also, I'm uncertain for usage of param_info in Path structure, even though
> I followed the manner in other portion. So, please point out if my usage
> was not applicable well.
>
> Thanks,
>
> 2013/11/11 Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>:
> > Hi,
> >
> > I tried to write up a wikipage to introduce how custom-scan works.
> >
> > https://wiki.postgresql.org/wiki/CustomScanAPI
> >
> > Any comments please.
> >
> > 2013/11/6 Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>:
> >> The attached patches provide a feature to implement custom scan node
> >> that allows extension to replace a part of plan tree with its own code
> >> instead of the built-in logic.
> >> In addition to the previous proposition, it enables us to integrate
> custom
> >> scan as a part of candidate paths to be chosen by optimizer.
> >> Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers
> >> a set of API stuff and a simple demonstration module that implement
> >> regular table scan using inequality operator on ctid system column.
> >> The second one (pgsql-v9.4-custom-scan-remote-join) enhances
> >> postgres_fdw to support remote join capability.
> >>
> >> Below is an example to show how does custom-scan work.
> >>
> >> We usually run sequential scan even if clause has inequality operator
> >> that references ctid system column.
> >>
> >> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
> >> QUERY PLAN
> >> --------------------------------------------------------
> >> Seq Scan on t1 (cost=0.00..209.00 rows=3333 width=43)
> >> Filter: (ctid > '(10,0)'::tid)
> >> (2 rows)
> >>
> >> An extension that performs as custom-scan provider suggests
> >> an alternative path, and its cost was less than sequential scan,
> >> thus optimizer choose it.
> >>
> >> postgres=# LOAD 'ctidscan';
> >> LOAD
> >> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
> >> QUERY PLAN
> >> ----------------------------------------------------------------------
> >> Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows=3333 width=43)
> >> Filter: (ctid > '(10,0)'::tid)
> >> (2 rows)
> >>
> >> Of course, more cost effective plan will win if exists.
> >>
> >> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND
> a = 200;
> >> QUERY PLAN
> >> -------------------------------------------------------------------
> >> Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43)
> >> Index Cond: (a = 200)
> >> Filter: (ctid > '(10,0)'::tid)
> >> (3 rows)
> >>
> >> One other worthwhile example is remote-join enhancement on the
> >> postgres_fdw as follows. Both of ft1 and ft2 are foreign table being
> >> managed by same foreign server.
> >>
> >> postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x
> >> WHERE f_leak(b) AND y
> >> like '%aaa%';
> >> QUERY
> PLAN
> >>
> ------------------------------------------------------------------------------------------------------
> >> Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72)
> >> Output: a, b, x, y
> >> Filter: f_leak(b)
> >> Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN
> >> public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text))
> >> (4 rows)
> >>
> >> ---------------------------
> >> How does it works
> >> ---------------------------
> >> This patch adds two hooks (for base and join relations) around
> allpaths.c
> >> and joinpaths.c. It allows extensions to add alternative paths to handle
> >> scanning on the base relation or join of two relations.
> >>
> >> Its callback routine can add CustomPath using add_path() to inform
> >> optimizer this alternative scan path. Every custom-scan provider is
> >> identified by its name being registered preliminary using the following
> >> function.
> >>
> >> void register_custom_provider(const CustomProvider *provider);
> >>
> >> CustomProvider is a set of name string and function pointers of
> callbacks.
> >>
> >> Once CustomPath got chosen, create_scan_plan() construct a custom-
> >> scan plan and calls back extension to initialize the node.
> >> Rest of portions are similar to foreign scan, however, some of detailed
> >> portions are different. For example, foreign scan is assumed to return
> >> a tuple being formed according to table definition. On the other hand,
> >> custom-scan does not have such assumption, so extension needs to
> >> set tuple-descriptor on the scan tuple slot of ScanState structure by
> >> itself.
> >>
> >> In case of join, custom-scan performs as like a regular scan but it
> >> returns tuples being already joined on underlying relations.
> >> The patched postgres_fdw utilizes a hook at joinpaths.c to run
> >> remote join.
> >>
> >> ------------
> >> Issues
> >> ------------
> >> I'm not 100% certain whether arguments of add_join_path_hook is
> >> reasonable. I guess the first 7 arguments are minimum necessity.
> >> The mergeclause_list and semifactors might be useful if someone
> >> tries to implement its own mergejoin or semijoin. Also, I'm not
> >> good at usage of path parameterization, but the last two arguments
> >> are related to. Where is the best code to learn about its usage?
> >>
> >> +/* Hook for plugins to add custom join path, in addition to default
> ones */
> >> +typedef void (*add_join_path_hook_type)(PlannerInfo *root,
> >> + RelOptInfo *joinrel,
> >> + RelOptInfo *outerrel,
> >> + RelOptInfo *innerrel,
> >> + JoinType jointype,
> >> + SpecialJoinInfo *sjinfo,
> >> + List *restrictlist,
> >> + List *mergeclause_list,
> >> + SemiAntiJoinFactors
> *semifactors,
> >> + Relids param_source_rels,
> >> + Relids extra_lateral_rels);
> >> +extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
> >>
> >>
> >> When we replace a join by a custom scan, where is the best target
> >> for Var node that referenced relations under the join?
> >> Usually, Var->varno is given as rtindex of tables being joined, then,
> >> it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references().
> >> It eventually determines the slot to be fetched on ExecEvalScalarVar().
> >> On the other hand, we want Var-node to reference scan-tuple-slot
> >> neither outer-slot nor inner-slot, if we replaced a join.
> >> I tried to add a new CUSTOM_VAR that references scan-tuple-slot.
> >> Probably, it is a straightforward way to run remote join as like a scan,
> >> but I'm not certain whether it is the best way.
> >>
> >>
> >> I was concerned about FDW callback of postgres_fdw is designed to
> >> take ForeignState argument. Because of this, remote join code did
> >> not available to call these routines, even though most of custom-join
> >> portions are similar.
> >> So, I'd like to rework postgres_fdw first to put a common routine that
> >> can be called from FDW portion and remote join portions.
> >> However, I thought it makes reviewing hard due to the large scale of
> >> changeset. So, I'd like to have a code reworking first.
> >>
> >>
> >> ----------------
> >> Jobs to do
> >> ----------------
> >> * SGML documentation like fdwhandler.sgml is still under construction.
> >> * Probably, a wikipage may help people to understand it well.
> >> * Postgres_fdw needs reworking to share common code for both of
> >> FDW and remote join portions.
> >>
> >> Thanks,
> >>
> >> 2013/10/5 Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>:
> >>> 2013/10/3 Robert Haas <robertmhaas(at)gmail(dot)com>:
> >>>> Well, there were a lot of problems with your demonstration, which have
> >>>> already been pointed out upthread. I'm skeptical about the idea of
> >>>> simply replacing planner nodes wholesale, and Tom is outright opposed.
> >>>> I think you'll do better to focus on a narrower case - I'd suggest
> >>>> custom scan nodes - and leave the rest as a project for another time.
> >>>>
> >>> Thanks, it makes me clear what we should target on v9.4 development.
> >>> Towards the next commitfest, I'm planning to develop the following
> >>> features:
> >>> * CustomScan node that can run custom code instead of built-in
> >>> scan nodes.
> >>> * Join-pushdown of postgres_fdw using the hook to be located on
> >>> the add_paths_to_joinrel(), for demonstration purpose.
> >>> * Something new way to scan a relation; probably, your suggested
> >>> ctid scan with less or bigger qualifier is a good example, also for
> >>> demonstration purpose.
> >>>
> >>> Probably, above set of jobs will be the first chunk of this feature.
> >>> Then, let's do other stuff like Append, Sort, Aggregate and so on
> >>> later. It seems to me a reasonable strategy.
> >>>
> >>
> >> --
> >> KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>
> >
> >
> >
> > --
> > KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>
>
>
>
> --
> KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2013-11-19 00:37:28 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Previous Message Antonin Houska 2013-11-18 22:59:21 Review: HStore Gin Speedup