Re: Join push-down support for foreign tables

From: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Join push-down support for foreign tables
Date: 2014-12-15 08:40:18
Message-ID: CAEZqfEfnXDMSh2PuQFbGNJh+DxbSkLqLRGaNGh_pNTBVtaX1cQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

I'm working on $SUBJECT and would like to get comments about the
design. Attached patch is for the design below. Note that the patch
requires Kaigai-san's custom foriegn join patch[1]

[1] http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F80108C355@BPXM15GP.gisp.nec.co.jp

Joins to be pushed down
=======================
We have two levels of decision about Join push-down, core and FDW. I
think we should allow them to push down joins as much as we can unless
it doesn't break the semantics of join. Anyway FDWs should decide
whether the join can be pushed down or not, on the basis of the FDW's
capability.

Here is the list of checks which should be done in core:

1. Join source relations
All of foreign tables used in a join should be managed by one foreign
data wrapper. I once proposed that all source tables should belong to
one server, because at that time I assumed that FDWs use SERVER to
express physical place of data source. But Robert's comment gave me
an idea that SERVER is not important for some FDWs, so now I think
check about server matching should be done by FDWs.

USER MAPPING is another important attribute of foreign scan/join, and
IMO it should be checked by FDW because some of FDWs don't require
USER MAPPING. If an FDW want to check user mapping, all tables in the
join should belong to the same server and have same
RangeTablEntry#checkAsUser to ensure that only one user mapping is
derived.

2. Join type
Join type can be any, except JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER,
though most of FDWs would support only INNER and OUTER.
Pushing down CROSS joins might seem inefficient, because obviously
CROSS JOIN always produces more result than retrieving all rows from
each foreign table separately. However, some FDW might be able to
accelerate such join with cache or something. So I think we should
leave such decision to FDWs.

Here is the list of checks which shold be done in postgres_fdw:

1. Join source relations
As described above, postgres_fdw (and most of SQL-based FDWs) needs to
check that 1) all foreign tables in the join belong to a server, and
2) all foreign tables have same checkAsUser.
In addition to that, I add extra limitation that both inner/outer
should be plain foreign tables, not a result of foreign join. This
limiation makes SQL generator simple. Fundamentally it's possible to
join even join relations, so N-way join is listed as enhancement item
below.

2. Join type
In the first proposal, postgres_fdw allows INNER and OUTER joins to be
pushed down. CROSS, SEMI and ANTI would have much less use cases.

3. Join conditions and WHERE clauses
Join conditions should consist of semantically safe expressions.
Where the "semantically safe" means is same as WHERE clause push-down.

Planned enhancements for 9.5
============================
These features will be proposed as enhancements, hopefully in the 9.5
development cycle, but probably in 9.6.

1. Remove unnecessary column from SELECT clause
Columns which are used for only join conditions can be removed from
the target list, as postgres_fdw does in simple foreign scans.

2. Support N-way joins
Mostly for difficulty of SQL generation, I didn't add support of N-Way joins.

3. Proper cost estimation
Currently postgres_fdw always gives 0 as the cost of a foreign join,
as a compromise. This is because estimating costs of each join
without round-trip (EXPLAIN) is not easy. A rough idea about that I
currently have is to use local statistics, but determining join method
used at remote might require whole planner to run for the join
subtree.

Regards,
--
Shigeru HANADA

Attachment Content-Type Size
join_pushdown.patch application/octet-stream 37.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2014-12-15 08:41:36 Re: tracking commit timestamps
Previous Message Kouhei Kaigai 2014-12-15 08:40:09 Re: ctidscan as an example of custom-scan (Re: [v9.5] Custom Plan API)