Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
Date: 2015-05-18 05:03:39
Message-ID: CAFjFpRexQrWn3pL97cPs=P2BU0CLWWUQix0CUZJ8j2bfVh7M3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, May 16, 2015 at 6:34 PM, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
wrote:

> Attached is the v15 patch of foreign join support for postgres_fdw.
>
> This patch is based on current master, and having being removed some
> hunks which are not essential.
>
> And I wrote description of changes done by the patch. It is little
> bit long but I hope it would help understanding what the patch does.
>
>
>
> The total LOC of the patch is 3.7k, 1.8k for code and 2.0k for
> regression tests. This is not a small patch, as Robert says, so I'd
> like to summarize changed done by this patch and explain why they are
> necessary.
>
> Outline of join push-down support for postgres_fdw
> ==================================================
>
> This patch provides new capability to join between foriegn tables
> managed by same foreign server on remote side, by constructing a
> remote query containing join clause, and executing it as source of a
> pseudo foreign scan. This patch is based on Custom/Foreign join patch
> written by Kohei KaiGai.
>
> PostgreSQL's planning for a query containing join is done with these steps:
>
> 1. generate possible scan paths for each base relations
> 2. generate join paths with bottom-up approach
> 3. generate plan nodes required for the cheapest path
> 4. execute the plan nodes to obtain result tuples
>
> Generating path node
> --------------------
> As of now, postgres_fdw generates a ForeignPath which represents a
> result of a join for each RelOptInfo, and planner can determine which
> path is cheapest from its cost values.
>
> GetForeignJoinPaths is called once for each join combination, i.e. A
> JOIN B and B JOIN A are considered separately. So GetForeignJoinPath
> should return immediately to skip its job when the call is the
> reversed combination of already considered one. For this purpose, I
> added update_safe flag to PgFdwRelationInfo. This flag is always set
> for simple foriegn scans, but for join relation it is set only when
> the join can be pushed down. The reason of adding this flag is that
> checking RelOptInfo#fdw_private is MULL can't prevent useless
> processing for a join combination which is reversed one of already
> considered join which can't be pushed down.
>

This is just a suggestion, but you may actually get rid of the flag by
restricting the path generation only when say outer relation's pointer or
OID or relid is greater/lesser than inner relation's corresponding property.

>
> postgres_fdw's GetForeignJoinPaths() does various checks, to ensure
> that the result has same semantics as local joins. Now postgres_fdw
> have these criteria:
>
> a) join type must be one of INNER/LEFT OUTER/RIGHT OUTER/FULL OUTER join
> This check is done with given jointype argument. IOW, CROSS joins and
> SEMI/ANTI joins are not pushed down. This is because 1) CROSS joins
> would produe more result than separeted join sources,

We might loose on some optimizations in aggregate push-down by not creating
paths altogether for CROSS joins. If there is a count(*) on CROSS join
result, we will push count(*) since there doesn't exist a foreign path for
the join. OR it might be possible that pushing down A INNER JOIN B CROSS
JOIN C is cheaper than performing some or all of the joins locally. I think
we should create a path and let it stay in the paths list. If there is no
path which can use CROSS join path, it will discarded eventually. Sorry for
bringing this so late in the discussion.

> and 2) ANTI/SEMI
> joins need to be deparsed as sub-query and it seems to take some more
> time to implement.
> b) Both outer and inner must have RelOptInfo#fdw_private
> Having fdw_private means that the RelOptInfo is safe to push down, so
> having no fdw_private means that portion is not safe to push down and
> thus the whole join is not safe to push down.
> c) All relations in the join must belong to the same server
> This check is done with serverid stored in RelOptInfo#fdw_private as
> PgFdwRelationInfo. Joining relations belong to different servers is
> not leagal. Even they finally have completely same connection
> information, they should accessed via different libpq sessions.
> Responsibility of checking server matching is under discussion in the
> Custom/Foreign join thread, and I'd vote for checking it in core. If
> it is decided, I remove this criterion soon.
> d) All relations must have the same effective user id
> This check is done with userid stored in PgFdwRelationInfo, which is
> valid only when underlying relations have the same effective user id.
> Here "effective user id" means id of the user executing the query, or
> the owner of the view when the foreign table is accessed via view.
> Tom suggested that it is necessary to check that user mapping matches
> for security reason, and now I think it's better than checking
> effective user as current postgres_fdw does.
> e) Each source relation must not have any local filter
> Evaluating conditions of join source talbe potentially produces
> different result in OUTER join cases. This can be relaxed for the
> cases when the join is INNER and local filters don't contain any
> volatile function/operator, but it is left as future enhancement.
> f) All join conditions of non-inner join must be safe to push down
> This is similar to e).
>
> A join which passes all criteria above is safe to push-down, so
> postgres_fdw create a ForeignPath for the join and add it to
> RelOptInfo. Currently postgres_fdw doesn't set pathkeys (ordering
> information) nor require_outer (information for parameterized path).
>
> PgFdwRelationInfo is used to store various planning information
> specific to postgres_fdw. To support join push-down, I added some
> fields which are necessary to deparse join query recursively in
> deparseSelectSql.
>
> - outer RelOptInfo, to generate source relatoin as subquery
> - inner RelOptInfo, to generate source relation as subquery
> - jointype, to deparse JOIN keyword string (e.g. INNER JOIN)
> - joinclauses, to deprase ON part of JOIN clause
> - otherclauses, to deparse WHERE clause of join query
>
> I also moved it from postgres_fdw.c to postgres_fdw.h, because
> deparse.c needs to refer the definition during deparsing query.
>
> Generating plan node
> --------------------
> Once planner find the cheapest path, it generates plan tree from the
> path tree. During the steps, planner calls GetForeignPlan for the
> ForeignPath in the top of a join tree. IOW, GetForeignPlan is not
> called for underlying joins and scans, so postgres_fdw needs a way to
> do its task (mainly generating query string) recursively.
>
> GetForeignPlan generates remote query and store it in ForeignScan node
> to be returned. The construction of remote query is done by calling
> deparseSelectSql for given RelOptInfo. This function was modified to
> accept both base relations and join relations to support join
> push-down. Main part of generating join query is implemented in
> deparseJoinSql, which is a newly added function, and deparseSelectSql
> calls it if given relation was a join relation.
>
> One big change about deparsing base relation is aliasing. This patch
> adds column alias to SELECT clause even original query is a simple
> single table SELECT.
>
> fdw=# EXPLAIN (VERBOSE, COSTS false) SELECT * FROM pgbench_branches b;
> QUERY PLAN
>
> ------------------------------------------------------------------------------------
> Foreign Scan on public.pgbench_branches b
> Output: bid, bbalance, filler
> Remote SQL: SELECT bid a9, bbalance a10, filler a11 FROM
> public.pgbench_branches
> (3 rows)
>
> As you see, every column has alias in format "a%d" with index value
> derived from pg_attribute.attnum. Index value is attnum + 8, and the
> magic number "8" comes from FirstLowInvalidHeapAttributeNumber for the
> adjustment that makes attribute number of system attributes positive.
> To make the code more readable, I introduced local macro
> GET_RELATIVE_ATTNO(), but I'm not sure that this name is very nice.
> This column alias system allows to refer a particular column pointed
> by Var node in upper join level without consideration about column
> position in SELECT clause. To achieve this, deparseVar is expanded to
> handle variables used in join query, and deparseJoinVar is added to
> deparse column reference with table alias "l" or "r" for outer or
> inner respectively.
>
> As mentioned at the beginning of this section, GetForeignPlan is
> called only for the top node of the join tree, so we need to do
> something recursively. So postgres_fdw has information about join
> tree structure in PgFdwRelationInfo and pass it via
> RelOptInfo#fdw_private. This link works down to the base relation at
> the leaf of the join tree.
>
> When deparseSelectSql is called for a join relation, it calls
> deparseSelectSql for each outer and inner RelOptInfo to generate query
> strings, and pass them to deparseJoinSql as parts for FROM clause
> subquery. For example of two way join:
>
> [table definition]
> CREATE FOREIGN TABLE table1 (
> id int NOT NULL,
> name text,
> ...
> ) SERVER server;
> CREATE FOREIGN TABLE table2 (
> id int NOT NULL,
> name text,
> ...
> ) SERVER server;
>
> [original query]
> SELECT t1.name, t2.name FROM table1 t1 INNER JOIN table2 t2 ON t1.id =
> t2.id;
>
> [remote query]
> SELECT l.a2, r.a2
> FROM (SELECT id, name FROM table1) l (a1, a2)
> INNER JOIN
> (SELECT id, name FROM table2) r (a1, a2)
> ON (l.a1 = r.a1)
> ;
>
> During deparsing join query, deparseJoinSql maintains fdw_scan_tlist
> list to hold TargetEntry for columns used in SELECT clause of every
> query.
>
> One thing tricky is "peusdo projection" which is done by
> deparseProjectionSql for whole-row reference. This is done by put the
> query string in FROM subquery and add whole-row reference in outer
> SELECT clause. This is done by ExecProjection in 9.4 and older, but
> we need to do this in postgres_fdw because ExecProjection is not
> called any more.
>
> For various conditions, such as join conditions in JOIN clauses and
> filter conditions in WHERE clauses, appendCondition is used to deparse
> condition strings. This was expanded version of appendWhereClause.
> Note that appendConditions accepts list of RestrictInfo or list of
> Expr as source, and downcasts them properly. As of 9.4
> appendWhereClause accepted only RestrictInfo, but join conditions are
> Expr, so I made it little flexible.
>
> Finally deparseJoinSql construct whole query by putting parts into the
> right places. Note that column aliases are not written in SELECT
> clause but in FROM clause, after table alias. This simpifies SELECT
> clause construction simpler.
>

About deparsing stuff, it looks like we are duplicating the functionality
in ruleutils.c and more push-down stuff we add, more will be the
duplication. Can we reuse that functionality?

>
> debug stuffs
> ------------
> bms_to_str is a function which prints contents of a bitmapset in
> human-readable format. I added this for debug purpose, but IMO it's
> ok to have such function as public bitmapset API.
>
> 2015-05-03 10:51 GMT+09:00 Shigeru HANADA <shigeru(dot)hanada(at)gmail(dot)com>:
> > Thanks for the comments.
> >
> > 2015/05/01 22:35、Robert Haas <robertmhaas(at)gmail(dot)com> のメール:
> >> On Mon, Apr 27, 2015 at 5:07 AM, Shigeru Hanada
> >> <shigeru(dot)hanada(at)gmail(dot)com> wrote:
> >>> 2015-04-27 11:00 GMT+09:00 Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>:
> >>>> Hanada-san, could you adjust your postgres_fdw patch according to
> >>>> the above new (previous?) definition.
> >>>
> >>> The attached v14 patch is the revised version for your v13 patch. It
> also contains changed for Ashutosh’s comments.
> >>
> >> We should probably move this discussion to a new thread now that the
> >> other patch is committed. Changing subject line accordingly.
> >>
> >> Generally, there's an awful lot of changes in this patch - it is over
> >> 2000 insertions and more than 450 deletions - and it's not awfully
> >> obvious why all of those changes are there. I think this patch needs
> >> a detailed README to accompany it explaining what the various changes
> >> in the patch are and why those things got changed; or maybe there is a
> >> way to break it up into multiple patches so that we can take a more
> >> incremental approach. I am really suspicious of the amount of
> >> wholesale reorganization of code that this patch is doing. It's
> >> really hard to validate that a reorganization like that is necessary,
> >> or that it's correct, and it's gonna make back-patching noticeably
> >> harder in the future. If we really need this much code churn it needs
> >> careful justification; if we don't, we shouldn't do it.
> >>
> >
> > I agree. I’ll write detailed description for the patch and repost the
> new one with rebasing onto current HEAD. I’m sorry but it will take a day
> or so...
> >
> >> +SET enable_mergejoin = off; -- planner choose MergeJoin even it has
> >> higher costs, so disable it for testing.
> >>
> >> This seems awfully strange. Why would the planner choose a plan if it
> >> had a higher cost?
> >
> > I thought so but I couldn’t reproduce such situation now. I’ll
> investigate it more. If the issue has gone, I’ll remove that SET statement
> for straightforward tests.
> >
> >
> >>
> >> - * If the table or the server is configured to use remote
> estimates,
> >> - * identify which user to do remote access as during planning.
> This
> >> + * Identify which user to do remote access as during planning.
> This
> >> * should match what ExecCheckRTEPerms() does. If we fail due
> >> to lack of
> >> * permissions, the query would have failed at runtime anyway.
> >> */
> >> - if (fpinfo->use_remote_estimate)
> >> - {
> >> - RangeTblEntry *rte = planner_rt_fetch(baserel->relid,
> root);
> >> - Oid userid = rte->checkAsUser ?
> >> rte->checkAsUser : GetUserId();
> >> -
> >> - fpinfo->user = GetUserMapping(userid,
> fpinfo->server->serverid);
> >> - }
> >> - else
> >> - fpinfo->user = NULL;
> >> + rte = planner_rt_fetch(baserel->relid, root);
> >> + fpinfo->userid = rte->checkAsUser ? rte->checkAsUser :
> GetUserId();
> >>
> >> So, wait a minute, remote estimates aren't optional any more?
> >
> > No, it seems to be removed accidentally. I’ll check the reason again
> though, but I’ll revert the change unless I find any problem.
> >
> > --
> > Shigeru HANADA
> > shigeru(dot)hanada(at)gmail(dot)com
> >
> >
> >
> >
>
>
>
> --
> Shigeru HANADA
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
> --
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2015-05-18 05:47:52 Re: [COMMITTERS] pgsql: Add pg_audit, an auditing extension
Previous Message Abbas Butt 2015-05-18 04:32:18 Re: 9.5 open items