Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Date: 2015-03-03 09:30:36
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8010BABA6@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The attached version of custom/foreign-join interface patch
fixes up the problem reported on the join-pushdown support
thread.

The previous version referenced *_ps_tlist on setrefs.c, to
check whether the Custom/ForeignScan node is associated with
a particular base relation, or not.
This logic considered above nodes performs base relation scan,
if *_ps_tlist is valid. However, it was incorrect in case when
underlying pseudo-scan relation has empty targetlist.
Instead of the previous logic, it shall be revised to check
scanrelid itself. If zero, it means Custom/ForeignScan node is
not associated with a particular base relation, thus, its slot
descriptor for scan shall be constructed based on *_ps_tlist.

Also, I noticed a potential problem if CSP/FDW driver want to
displays expression nodes using deparse_expression() but
varnode within this expression does not appear in the *_ps_tlist.
For example, a remote query below shall return rows with two
columns.

SELECT atext, btext FROM tbl_a, tbl_b WHERE aid = bid;

Thus, ForeignScan will perform like as a scan on relation with
two columns, and FDW driver will set two TargetEntry on the
fdw_ps_tlist. If FDW is designed to keep the join condition
(aid = bid) using expression node form, it is expected to be
saved on custom/fdw_expr variable, then setrefs.c rewrites the
varnode according to *_ps_tlist.
It means, we also have to add *_ps_tlist both of "aid" and "bid"
to avoid failure on variable lookup. However, these additional
entries changes the definition of the slot descriptor.
So, I adjusted ExecInitForeignScan and ExecInitCustomScan to
use ExecCleanTypeFromTL(), not ExecTypeFromTL(), when it construct
the slot descriptor based on the *_ps_tlist.
It expects CSP/FDW drivers to add target-entries with resjunk=true,
if it wants to have additional entries for variable lookups on
EXPLAIN command.

Fortunately or unfortunately, postgres_fdw keeps its remote query
in cstring form, so it does not need to add junk entries on the
fdw_ps_tlist.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Kouhei Kaigai
> Sent: Sunday, February 15, 2015 11:01 PM
> To: Kaigai Kouhei(海外 浩平); Robert Haas
> Cc: Tom Lane; pgsql-hackers(at)postgreSQL(dot)org; Shigeru Hanada
> Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
>
> The attached patch is a rebased version of join replacement with
> foreign-/custom-scan. Here is no feature updates at this moment
> but SGML documentation is added (according to Michael's comment).
>
> This infrastructure allows foreign-data-wrapper and custom-scan-
> provider to add alternative scan paths towards relations join.
> From viewpoint of the executor, it looks like a scan on a pseudo-
> relation that is materialized from multiple relations, even though
> FDW/CSP internally processes relations join with their own logic.
>
> Its basic idea is, (1) scanrelid==0 indicates this foreign/custom
> scan node runs on a pseudo relation and (2) fdw_ps_tlist and
> custom_ps_tlist introduce the definition of the pseudo relation,
> because it is not associated with a tangible relation unlike
> simple scan case, thus planner cannot know the expected record
> type to be returned without these additional information.
> These two enhancement enables extensions to process relations
> join internally, and to perform as like existing scan node from
> viewpoint of the core backend.
>
> Also, as an aside. I had a discussion with Hanada-san about this
> interface off-list. He had an idea to keep create_plan_recurse()
> static, using a special list field in CustomPath structure to
> chain underlying Path node. If core backend translate the Path
> node to Plan node if valid list given, extension does not need to
> call create_plan_recurse() by itself.
> I have no preference about this. Does anybody have opinion?
>
> Thanks,
> --
> NEC OSS Promotion Center / PG-Strom Project
> KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
>
>
> > -----Original Message-----
> > From: pgsql-hackers-owner(at)postgresql(dot)org
> > [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Kouhei Kaigai
> > Sent: Thursday, January 15, 2015 8:03 AM
> > To: Robert Haas
> > Cc: Tom Lane; pgsql-hackers(at)postgreSQL(dot)org; Shigeru Hanada
> > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan
> > API)
> >
> > > On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
> > wrote:
> > > > When custom-scan node replaced a join-plan, it shall have at least
> > > > two child plan-nodes. The callback handler of PlanCustomPath needs
> > > > to be able to call create_plan_recurse() to transform the underlying
> > > > path-nodes to plan-nodes, because this custom-scan node may take
> > > > other built-in scan or sub-join nodes as its inner/outer input.
> > > > In case of FDW, it shall kick any underlying scan relations to
> > > > remote side, thus we may not expect ForeignScan has underlying plans...
> > >
> > > Do you have an example of this?
> > >
> > Yes, even though full code set is too large for patch submission...
> >
> > https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880
> >
> > This create_gpuhashjoin_plan() is PlanCustomPath callback of GpuHashJoin.
> > It takes GpuHashJoinPath inherited from CustomPath that has multiple
> > underlying scan/join paths.
> > Once it is called back from the backend, it also calls create_plan_recurse()
> > to make inner/outer plan nodes according to the paths.
> >
> > In the result, we can see the following query execution plan that CustomScan
> > takes underlying scan plans.
> >
> > postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2;
> > QUERY PLAN
> > ----------------------------------------------------------------------
> > ------------
> > Custom Scan (GpuHashJoin) (cost=2968.00..140120.31 rows=3970922
> > width=143)
> > Hash clause 1: (aid = aid)
> > Hash clause 2: (bid = bid)
> > Bulkload: On
> > -> Custom Scan (GpuScan) on t0 (cost=500.00..57643.00 rows=4000009
> > width=77)
> > -> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000
> > width=37)
> > hash keys: aid
> > nBatches: 1 Buckets: 46000 Memory Usage: 99.99%
> > -> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=37)
> > -> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000
> > width=37)
> > hash keys: bid
> > nBatches: 1 Buckets: 46000 Memory Usage: 49.99%
> > -> Seq Scan on t2 (cost=0.00..734.00 rows=40000
> > width=37)
> > (13 rows)
> >
> > Thanks,
> > --
> > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
> > <kaigai(at)ak(dot)jp(dot)nec(dot)com>
> >
> >
> > > -----Original Message-----
> > > From: Robert Haas [mailto:robertmhaas(at)gmail(dot)com]
> > > Sent: Thursday, January 15, 2015 2:07 AM
> > > To: Kaigai Kouhei(海外 浩平)
> > > Cc: Tom Lane; pgsql-hackers(at)postgreSQL(dot)org; Shigeru Hanada
> > > Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS]
> > > [v9.5] Custom Plan API)
> > >
> > > On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
> > wrote:
> > > > When custom-scan node replaced a join-plan, it shall have at least
> > > > two child plan-nodes. The callback handler of PlanCustomPath needs
> > > > to be able to call create_plan_recurse() to transform the underlying
> > > > path-nodes to plan-nodes, because this custom-scan node may take
> > > > other built-in scan or sub-join nodes as its inner/outer input.
> > > > In case of FDW, it shall kick any underlying scan relations to
> > > > remote side, thus we may not expect ForeignScan has underlying plans...
> > >
> > > Do you have an example of this?
> > >
> > > --
> > > Robert Haas
> > > EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
> > > Company
> >
> > --
> > Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org) To make
> > changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers

Attachment Content-Type Size
pgsql-v9.5-custom-join.v5.patch application/octet-stream 716.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message 张元超 2015-03-03 09:32:35 One question about security label command
Previous Message Kouhei Kaigai 2015-03-03 09:01:32 Re: Join push-down support for foreign tables