Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Date: 2015-03-17 10:12:57
Message-ID: CAFjFpRe2EvDUpgfBFejo6DyDCSi7dryGQ7hE8EistcO7jwwvMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 17, 2015 at 10:28 AM, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
wrote:

> 2015-03-14 7:18 GMT+09:00 Robert Haas <robertmhaas(at)gmail(dot)com>:
> > I think the foreign data wrapper join pushdown case, which also aims
> > to substitute a scan for a join, is interesting to think about, even
> > though it's likely to be handled by a new FDW method instead of via
> > the hook. Where should the FDW method get called from? Currently,
> > the FDW method in KaiGai's patch is GetForeignJoinPaths, and that gets
> > called from add_paths_to_joinrel(). The patch at
> >
> http://www.postgresql.org/message-id/CAEZqfEfy7p=uRpwN-Q-NNgzb8kwHbfqF82YSb9ztFZG7zN64Xw@mail.gmail.com
> > uses that to implement join pushdown in postgres_fdw; if you have A
> > JOIN B JOIN C all on server X, we'll notice that the join with A and B
> > can be turned into a foreign scan on A JOIN B, and similarly for A-C
> > and B-C. Then, if it turns out that the cheapest path for A-B is the
> > foreign join, and the cheapest path for C is a foreign scan, we'll
> > arrive at the idea of a foreign scan on A-B-C, and we'll realize the
> > same thing in each of the other combinations as well. So, eventually
> > the foreign join gets pushed down.
>
> From the viewpoint of postgres_fdw, incremental approach seemed
> natural way, although postgres_fdw should consider paths in pathlist
> in additon to cheapest one as you mentioned in another thread. This
> approarch allows FDW to use SQL statement generated for underlying
> scans as parts of FROM clause, as postgres_fdw does in the join
> push-down patch.
>
> > But there's another possible approach: suppose that
> > join_search_one_level, after considering left-sided and right-sided
> > joins and after considering bushy joins, checks whether every relation
> > it's got is from the same foreign server, and if so, asks that foreign
> > server whether it would like to contribute any paths. Would that be
> > better or worse? A disadvantage is that if you've got something like
> > A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT
> > JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed
> > down (say, each join clause calls a non-pushdown-safe function) you'll
> > end up examining a pile of joinrels - at every level of the join tree
> > - and individually rejecting each one. With the
> > build-it-up-incrementally approach, you'll figure that all out at
> > level 2, and then after that there's nothing to do but give up
> > quickly. On the other hand, I'm afraid the incremental approach might
> > miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x =
> > huge.x) ON small.y = big.y AND small.z = huge.z, where all three are
> > foreign tables on the same server. If the output of the big/huge join
> > is big, none of those paths are going to survive at level 2, but the
> > overall join size might be very small, so we surely want a chance to
> > recover at level 3. (We discussed test cases of this form quite a bit
> > in the context of e2fa76d80ba571d4de8992de6386536867250474.)
>
> Interesting, I overlooked that pattern. As you pointed out, join
> between big foregin tables might be dominated, perhaps by a MergeJoin
> path. Leaving dominated ForeignPath in pathlist for more optimization
> in the future (in higher join level) is an idea, but it would make
> planning time longer (and use more cycle and memory).
>
> Tom's idea sounds good for saving the path b), but I worry that
> whether FDW can get enough information at that timing, just before
> set_cheapest. It would not be good I/F if each FDW needs to copy many
> code form joinrel.c...
>
>
Even I have the same concern. A simple joinrel doesn't contain much
information about the individual two way joins involved in it, so FDW may
not be able to construct a query (or execution plan) and hence judge
whether a join is pushable or not, just by looking at the joinrel. There
will be a lot of code duplication to reconstruct that information, within
the FDW code.

> --
> Shigeru HANADA
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sergey Shchukin 2015-03-17 10:22:27 Re: Re: [pgadmin-support] Issue with a hanging apply process on the replica db after vacuum works on primary
Previous Message Ashutosh Bapat 2015-03-17 09:40:09 Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)