Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thom Brown <thom(at)linux(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
Date: 2016-02-03 17:08:46
Message-ID: CAFjFpRf-DOXh=ZcS9jLo-=VDgGq-YOfH5K3tP+Q7g2eX7-wp5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

PFA patches with naming conventions similar to previous ones.
pg_fdw_core_v7.patch: core changes
pg_fdw_join_v7.patch: postgres_fdw changes for join pushdown
pg_join_pd_v7.patch: combined patch for ease of testing.

On Tue, Feb 2, 2016 at 10:02 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Tue, Feb 2, 2016 at 11:21 AM, Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> >> Why does deparseSelectStmtForRel change the order of the existing
> >> arguments? I have no issue with adding new arguments as required, but
> >> rearranging the existing argument order doesn't serve any useful
> >> purpose that is immediately apparent.
> >
> > deparseSelectStmtForRel has two sets of arguments, input and output. They
> > are separated in the declaration all inputs come first, followed by all
> > outputs. The inputs were ordered according to their appearance in SELECT
> > statement, so I added tlist before remote_conds. I should have added
> > relations, which is an output argument, at the end, but I accidentally
> added
> > it between existing output arguments. Anyway, I will go ahead and just
> add
> > the new arguments after the existing ones.
>
> No, that's not what I'm asking for, nor do I think it's right. What
> I'm complaining about is that originally params_list was after
> retrieved_attrs, but in v5 it's before retrieved_attrs. I'm fine with
> inserting tlist after rel, or in general inserting new arguments in
> the sequence. But you reversed the relative ordering of params_list
> and retrieved_attrs.
>

Ok, fixed in this patch.

>
> > I was thinking on the similar lines except rN aliases. I think there
> will be
> > problem for queries like
> > postgres=# explain verbose select * from lt left join (select bar.a,
> foo.b
> > from bar left join foo on (bar.a = foo.a) where bar.b + foo.b < 10) q on
> > (lt.b = q.b);
> > QUERY PLAN
> >
> --------------------------------------------------------------------------------
> > Hash Right Join (cost=318.03..872.45 rows=43 width=16)
> > Output: lt.a, lt.b, bar.a, foo.b
> > Hash Cond: (foo.b = lt.b)
> > -> Merge Join (cost=317.01..839.07 rows=8513 width=8)
> > Output: bar.a, foo.b
> > Merge Cond: (bar.a = foo.a)
> > Join Filter: ((bar.b + foo.b) < 10)
> > -> Sort (cost=158.51..164.16 rows=2260 width=8)
> > Output: bar.a, bar.b
> > Sort Key: bar.a
> > -> Seq Scan on public.bar (cost=0.00..32.60 rows=2260
> > width=8)
> > Output: bar.a, bar.b
> > -> Sort (cost=158.51..164.16 rows=2260 width=8)
> > Output: foo.b, foo.a
> > Sort Key: foo.a
> > -> Seq Scan on public.foo (cost=0.00..32.60 rows=2260
> > width=8)
> > Output: foo.b, foo.a
> > -> Hash (cost=1.01..1.01 rows=1 width=8)
> > Output: lt.a, lt.b
> > -> Seq Scan on public.lt (cost=0.00..1.01 rows=1 width=8)
> > Output: lt.a, lt.b
> > (21 rows)
> >
> > The subquery q is pulled up, so there won't be trace of q in the join
> tree
> > except may be a useless RTE for the subquery. There will be RelOptInfo
> > representing join between lt, bar and foo and a RelOptInfo for join
> between
> > bar and foo. The join filter bar.b + foo.b < 10 needs to be evaluated
> before
> > joining (bar, foo) with lt and should go with bar left join foo. But the
> > syntax doesn't support something like "bar left join foo on (bar.a =
> foo.a)
> > where bar.b + foo.b". So we will have to construct a SELECT statement for
> > this join and add to the FROM clause with a subquery alias and then refer
> > the columns of foo and bar with that subquery alias.
>
> Hmm, does it work if we put bar.b + foo.b < 10 in the ON clause for
> the join between lt and foo/bar? I think so...
>
> > Further during the process of qual placement, quals that can be
> evaluated at
> > the level of given relation in the join tree are attached to that
> relation
> > if they can be pushed down. Thus if we see a qual attached to a given
> > relation, AFAIU, we can not say whether it needs to be evaluated there
> > (similar to above query) or planner pushed it down for optimization, and
> > thus for every join relation with quals we will need to build subqueries
> > with aliases.
>
> I don't think that's true. I theorize that every qual can either go
> into the top level WHERE clause or the ON clause of some join.
>
>
The patch implements your algorithm to deparse a query as described in
previous mail. The logic is largely coded in deparseFromExprForRel() and
foreign_join_ok(). The later one pulls up the clauses from joining
relations and first one deparses the FROM clause recursively.

While you suggested that we construct FROM clauses while constructing
fpinfo, there is problem maintaining params_list. While we deparse the
clauses, we will build the params lists independently for the joining
relations and might have conflicting param indexes depending upon when a
particular node is seen. Also, during the process as more and more outer
relations get added to the join tree being pushed down, some parameters
might vanish. So, if have to build it while we construct fpinfo we will
have to find a way to sanitize the params_list.

I have modified the deparseColumnRef and related functions to output
<relation alias>.<column name> while deparsing the join tree. Updated
deparseLockingClause to output FOR SHARE/FOR UPDATE clauses at the
outermost query level. Removed the useless functions and structure members.

I haven't run pgindent on the changes yet. Sorry. Mostly, I will be able to
do that for the next version.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_fdw_core_v7.patch text/plain 7.8 KB
pg_fdw_join_v7.patch text/plain 138.5 KB
pg_join_pd_v7.patch text/plain 175.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2016-02-03 17:18:02 Re: 2016-01 Commitfest
Previous Message Robert Haas 2016-02-03 16:55:14 Re: [POC] FETCH limited by bytes.