Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thom Brown <thom(at)linux(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
Date: 2016-02-02 16:21:38
Message-ID: CAFjFpReTBA+RdDBV3pJ=0_SQ9QbNgPm_AyF79UV652bawHrQ0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 2, 2016 at 10:13 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Mon, Feb 1, 2016 at 6:48 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > More when I have a bit more time to look at this...
>
> Why does deparseSelectStmtForRel change the order of the existing
> arguments? I have no issue with adding new arguments as required, but
> rearranging the existing argument order doesn't serve any useful
> purpose that is immediately apparent.

deparseSelectStmtForRel has two sets of arguments, input and output. They
are separated in the declaration all inputs come first, followed by all
outputs. The inputs were ordered according to their appearance in SELECT
statement, so I added tlist before remote_conds. I should have added
relations, which is an output argument, at the end, but I accidentally
added it between existing output arguments. Anyway, I will go ahead and
just add the new arguments after the existing ones.

> There's also no real need for
> the rel -> foreignrel renaming.
>

That was an unintentional change during merge. Sorry for that. Reverted it.

>
> + /*
> + * If we have constructed the SELECT clause from the targetlist,
> construct
> + * the retrieved attributes list as continuously increasing list of
> + * integers.
> + */
> + if (retrieved_attrs && tlist)
> + {
> + int i;
> + *retrieved_attrs = NIL;
> + for (i = 1; i <= list_length(tlist); i++)
> + *retrieved_attrs = lappend_int(*retrieved_attrs, i);
> + }
>
>
This is really wonky. First, you pass retrieved_attrs to
> deparseSelectSqlForBaseRel, but then you have this code which blows it
> away and replaces it if tlist != NIL. So I guess this will run always
> for a join relation, and for a base relation only sometimes. But
> that's certainly not at all clear. I think you need to find some way
> of rearranging this so that it's more obvious what is going on here.
>

I have pushed retrieved_attrs as an argument to deparseExplicitTargetList()
and deparseSelectSqlForJoinRel() to keep the things consistent.

> I suggest not renaming the existing deparseTargetList() and instead
> coming up with a different name for the new thing you need, maybe
> deparseExplicitTargetList().
>

Done.

>
> How about adding another sentence to the header comment for
> appendConditions() saying something like "This is used for both WHERE
> clauses and for JOIN .. ON"?
>

Done.

>
> + * The targetlist is list of TargetEntry's which in turn contains Var
> nodes.
>
> contain.
>
>
Done.

> +/*
> + * Output join name for given join type */
>
> Formatting.

Done.

> This patch, overall, is badly in need of a pgindent run.
>

Sorry, I haven't run pgindent on the attached patches. But will do that
next.

>
> + /*
> + * XXX
> + * Since the query is being built in recursive manner from bottom up,
> + * the FOR UPDATE/SHARE clause referring the base relations can
> not be added
> + * at the top level. They need to be added to the subqueries
> corresponding
> + * to the base relations. This has an undesirable effect of locking
> more
> + * rows than specified by user, as it locks even those rows from base
> + * relations which are not part of the final join result. To avoid
> this
> + * undesirable effect, we need to build the join query without the
> + * subqueries, which for now, seems hard.
> + */
>
> This is a second good reason that we should actually do that work
> instead of complaining that it's too hard. The first one is that the
> queries that come out of this right now are too long and hard to read.
> I actually don't see why this is all that hard. Deparsing the target
> list is simple enough; you just need to emit tab.col using varno to
> determine what tab looks like and varattno to determine what col looks
> like. The trickier part is emitting the FROM clause. But this doesn't
> seem all that hard either. Suppose that when we construct the fpinfo
> (PgFdwRelationInfo *) for a relation, we include in it a FROM clause
> appropriate to that rel. So, for a baserel, it's something like "foo
> r4" where 4 is foo's RTI. For a joinrel, do this:
>
> 1. Emit the FROM clause constructed for the outer relation,
> surrounding it with parentheses if the outer relation is a joinrel.
> 2. Emit " JOIN ", " LEFT JOIN ", " RIGHT JOIN ", or " FULL JOIN "
> according to the join type.
> 3. Emit the FROM clause constructed for the inner relation,
> surrounding it with parentheses if the inner relation is a joinrel.
> 4. Emit " ON ".
> 5. Emit the joinqual.
>
> This will produce nice things like (foo r3 JOIN bar r4 ON r3.x = r4.x)
> JOIN baz r2 ON r3.y = r2.y
>
> Then, you'd also need to stuff the conditions into the
> PgFdwRelationInfo so that those could be added to paths constructed at
> higher levels. But that's not too hard either. Basically you'd end
> up with mostly the same stuff you have now, but the PgFdwRelationInfo
> would store a join clause and a set of deparsed quals to be included
> in the FROM and WHERE clauses respectively. And then you'd pull the
> information from the inner and outer sides to build up what you need
> at the joinrel level.
>

I was thinking on the similar lines except rN aliases. I think there will
be problem for queries like
postgres=# explain verbose select * from lt left join (select bar.a, foo.b
from bar left join foo on (bar.a = foo.a) where bar.b + foo.b < 10) q on
(lt.b = q.b);
QUERY
PLAN
--------------------------------------------------------------------------------
Hash Right Join (cost=318.03..872.45 rows=43 width=16)
Output: lt.a, lt.b, bar.a, foo.b
Hash Cond: (foo.b = lt.b)
-> Merge Join (cost=317.01..839.07 rows=8513 width=8)
Output: bar.a, foo.b
Merge Cond: (bar.a = foo.a)
*Join Filter: ((bar.b + foo.b) < 10)*
-> Sort (cost=158.51..164.16 rows=2260 width=8)
Output: bar.a, bar.b
Sort Key: bar.a
-> Seq Scan on public.bar (cost=0.00..32.60 rows=2260
width=8)
Output: bar.a, bar.b
-> Sort (cost=158.51..164.16 rows=2260 width=8)
Output: foo.b, foo.a
Sort Key: foo.a
-> Seq Scan on public.foo (cost=0.00..32.60 rows=2260
width=8)
Output: foo.b, foo.a
-> Hash (cost=1.01..1.01 rows=1 width=8)
Output: lt.a, lt.b
-> Seq Scan on public.lt (cost=0.00..1.01 rows=1 width=8)
Output: lt.a, lt.b
(21 rows)

The subquery q is pulled up, so there won't be trace of q in the join tree
except may be a useless RTE for the subquery. There will be RelOptInfo
representing join between lt, bar and foo and a RelOptInfo for join between
bar and foo. The join filter bar.b + foo.b < 10 needs to be evaluated
before joining (bar, foo) with lt and should go with bar left join foo. But
the syntax doesn't support something like "bar left join foo on (bar.a =
foo.a) where bar.b + foo.b". So we will have to construct a SELECT
statement for this join and add to the FROM clause with a subquery alias
and then refer the columns of foo and bar with that subquery alias.

Further during the process of qual placement, quals that can be evaluated
at the level of given relation in the join tree are attached to that
relation if they can be pushed down. Thus if we see a qual attached to a
given relation, AFAIU, we can not say whether it needs to be evaluated
there (similar to above query) or planner pushed it down for optimization,
and thus for every join relation with quals we will need to build
subqueries with aliases.

I am still looking at how we can make this work.

> This would actually be faster than what you've got right now, because
> right now you're recursing down the whole join tree all over again at
> each new join level, maybe not the best plan.

>
+ if (foreignrel->reloptkind == RELOPT_JOINREL)
> + return;
> +
> /* Add any necessary FOR UPDATE/SHARE. */
> deparseLockingClause(&context);
>

Done.

>
> Generally, I think in these kinds of cases it is better to reverse the
> test and get rid of the return statement, like this:
>
> if (foreignrel->reloptkind != RELOPT_JOINREL)
> deparseLockingClause(&context);
>
> The way you wrote it, somebody who wants to add more code to the end
> of the function will probably have to make that change anyhow.
> Really, your intention here was to skip that code for joins, not to
>
skip the rest of the function for baserels.
>

Yes, you are right. Actually before deparseLockingClause change was
committed, all of that code was here, so this way worked. But I forgot to
change it while merging. Thanks for pointing out.

>
> @@ -1424,9 +1875,7 @@ deparseVar(Var *node, deparse_expr_cxt *context)
> printRemoteParam(pindex, node->vartype, node->vartypmod,
> context);
> }
> else
> - {
> printRemotePlaceholder(node->vartype, node->vartypmod,
> context);
> - }
> }
> }
>
> Useless hunk.
>

Reverted the change.

>
> + * Constructing queries representating SEMI and ANTI joins is hard,
> hence
>
> Typo.
>

Done. Thanks for pointing out the typos.

Attached patches
pg_fdw_core_v6.patch: core changes
pg_fdw_join_v6.patch: postgres_fdw changes for join pushdown
pg_join_pd_v6.patch: combined patch for ease of testing.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_fdw_core_v6.patch text/plain 7.8 KB
pg_fdw_join_v6.patch text/plain 134.2 KB
pg_join_pd_v6.patch text/plain 170.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-02-02 16:23:57 Re: pgbench - allow backslash-continuations in custom scripts
Previous Message Ashutosh Bapat 2016-02-02 16:18:12 Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)