Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
Date: 2015-12-03 07:06:02
Message-ID: 565FE9DA.2020204@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Ashutosh,

On 2015/12/02 20:45, Ashutosh Bapat wrote:
> It's been a long time since last patch on this thread was posted. I have
> started
> to work on supporting join pushdown for postgres_fdw.

Thanks for the work!

> Generating paths
> ================

> A join between two foreign relations is considered safe to push down if
> 1. The joining sides are pushable
> 2. The type of join is OUTER or INNER (LEFT/RIGHT/FULL/INNER). SEMI and ANTI
> joins are not considered right now, because of difficulties in
> constructing
> the queries involving those. The join clauses of SEMI/ANTI joins are
> not in a
> form that can be readily converted to IN/EXISTS/NOT EXIST kind of
> expression.
> We might consider this as future optimization.
> 3. Joining sides do not have clauses which can not be pushed down to the
> foreign
> server. For an OUTER join this is important since those clauses need
> to be
> applied before performing the join and thus join can not be pushed
> to the
> foreign server. An example is
> SELECT * FROM ft1 LEFT JOIN (SELECT * FROM ft2 where local_cond) ft2
> ON (join clause)
> Here the local_cond on ft2 needs to be executed before performing
> LEFT JOIN
> between ft1 and ft2.
> This condition can be relaxed for an INNER join by pulling the local
> clauses
> up the join tree. But this needs more investigation and is not
> considered in
> this version.
> 4. The join conditions (e.g. conditions in ON clause) are all safe to
> push down.
> This is important for OUTER joins as pushing down join clauses
> partially and
> applying rest locally changes the result. There are ways [1] by
> which partial
> OUTER join can be completed by applying unpushable clauses locally
> and then
> nullifying the nullable side and eliminating duplicate non-nullable side
> rows. But that's again out of scope of first version of postgres_fdw
> join
> pushdown.

As for 4, as commented in the patch, we could relax the requirement that
all the join conditions (given by JoinPathExtraData's restrictlist) need
to be safe to push down to the remote server;
* In case of inner join, all the conditions would not need to be safe.
* In case of outer join, all the "otherclauses" would not need to be
safe, while I think all the "joinclauses" need to be safe to get the
right results (where "joinclauses" and "otherclauses" are defined by
extract_actual_join_clauses). And I think we should do this relaxation
to some extent for 9.6, to allow more joins to be pushed down. I don't
know about [1]. May I see more information about [1]?

> Generating plan
> ===============

> Rest of this section describes the logic to construct
> the SQL
> for join; the logic is implemented as function deparseSelectSqlForRel().
>
> deparseSelectSqlForRel() builds the SQL for given joinrel (and now for
> baserel
> asd well) recursively.
> For joinrels
> 1. it constructs SQL representing either side of join, by calling itself
> in recursive fashion.
> 2. These SQLs are converted into subqueries and become part of the FROM
> clause
> with appropriate JOIN type and clauses. The left and right
> subqueries are
> given aliases "l" and "r" respectively. The columns in each subquery are
> aliased as "a1", "a2", "a3" and so on. Thus the third column on left
> side can
> be referenced as "l.a3" at any recursion level.
> 3. Targetlist is added representing the columns in the join result
> expected at
> that level.
> 4. The join clauses are added as part of ON clause
> 5. Any clauses that planner has deemed fit to be evaluated at that level
> of join
> are added as part of WHERE clause.

Honestly, I'm not sure that that is a good idea. One reason for that is
that a query string constructed by the procedure is difficult to read
especially when the procedure is applied recursively. So, I'm thinking
to revise the procedure so as to construct a query string with a
flattened FROM clause, as discussed in eg, [2].

> TODOs
> =====
> This patch is very much WIP patch to show case the approach and invite early
> comments. I will continue to improve the patch and some of the areas
> that will
> be improved are
> 1. Costing of foreign join paths.
> 2. Various TODOs in the patch, making it more readable, finishing etc.
> 3. Tests
> 4. Any comments/suggestions on approach or the attached patch.

That would be great!

> In another thread Robert, Fujita-san and Kaigai-san are discussing about
> EvalPlanQual support for foreign joins. Corresponding changes to
> postgres_fdw
> will need to be added once those changes get committed.

Yeah, we would need those changes including helper functions to create a
local join execution plan for that support. I'd like to add those
changes to your updated patch if it's okay.

Best regards,
Etsuro Fujita

[2]
http://www.postgresql.org/message-id/CA+TgmoZH9PB8BC+Z3rE7wo8CwuxAF7VP3066iSG39QfR1jJ+UQ@mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2015-12-03 07:06:25 Re: Parallel Aggregate
Previous Message Pavel Stehule 2015-12-03 06:58:59 Re: proposal: PL/Pythonu - function ereport