Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thom Brown <thom(at)linux(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
Date: 2016-01-28 16:26:56
Message-ID: CAFjFpRfLmq9KBipnJ2fvhBjeOXVGyPpbFDVfMdZCQfsvQy69nQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi All,
Here's an updated version of the previous patches, broken up like before
1. pg_fdw_core_v3.patch: changes in core - more description below
2. pg_fdw_join_v3.patch: changes to postgres_fdw - more description below
3. pg_join_pd_v3.patch: combined patch for easy testing

Here is the summary of changes from the last set of patches
1. Removed GUC enable_foreignjoin as well as extra documentation for
GetForeignTable().

2. Included fix for EvalPlanQual in postgres_fdw - an alternate local path
is obtained from the list of paths linked to the joinrel. Since this is
done before adding the ForeignPath, we should be a local path available for
given join.

3. Moved code to obtain user mapping id for given relation from
get_relation_info() to build_simple_rel() to avoid an extra call to
planner_rt_fetch().

4. Plan cache invalidation logic when the user which tries to execute a
cached plan is different from a user which created the plan. Also, plan
cache invalidation logic in case there are changes to user mapping system
cache. An example case is Public user mapping was used while planning but
before the (cached) plan was executed again, the user mapping for one of
the effective users involved was added with different credentials. We
should expect the new user mapping to be used for fetching data from
corresponding foreign table and thus join becomes unsafe to be pushed down.
Added tests in postgres_fdw.sql for these two issues.

5. removed find_var_pos() and instead inlined the logic into its caller as
suggested by Robert. get_jointype_name() uses switch.

6. The functions in deparse.c name the functions depending upon the object
that parser will create on parsing the output produced by those functions.
E.g. deparseColumnRef() produced column names which when parsed would
produce ColumnRef object. In this patch, the new functions added use
similar convention. Now there are multiple functions which would produce
output which when parsed would create same object. In order to have
different names for such functions, the names also include the purpose of
deparsing or kind of input etc. deparseJoinVar in the previous patch is now
deparseColumnRefForJoinrel() since it produces column references for a join
relation. There is corresponding deparseColumnRefForBaserel(). I have not
changed the name of existing deparseColumnRef, which deparses the
non-system column names of a foreign table. There are many changes to
comments making them more readable and also some modularization. Let me
know if the new names make sense.

On Mon, Jan 25, 2016 at 10:43 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Thu, Jan 21, 2016 at 8:36 AM, Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> >> What this patch does to the naming and calling conventions in
> >> deparse.c does not good. Previously, we had deparseTargetList().
> >> Now, we sometimes use that and sometimes deparseAttrsUsed() for almost
> >> the same thing.
> >
> > Previously deparseTargetList deparsed the SELECT or RETURNING clause by
> > including list of name of attributes provided by attrs_used. That's now
> done
> > by deparseAttrsUsed(). In current path deparseTargetList() deparses the
> > targetlist i.e. list of TargetEntry nodes (right now only Vars). Although
> > these two functions return comma separated string of column names, their
> > inputs are different. deparseAttrsUsed() can never be called for more
> than
> > one base relation. deparseTargetList() on the other hand can deparse a
> > targetlist with Var nodes from multiple relations. We need those two
> > functionalities separately. We might convert attrs_used into a list of
> > TargetEntry nodes using build_tlist_to_deparse() and use
> deparseTargetList
> > everywhere. A side effect of that would be separating retrieved_attrs
> > collection from deparsing code. I didn't do that change in this version
> to
> > avoid large code changes. But I am fine doing that, if that makes code
> > readable.
> >
> > If we have to keep old deparseTargetList as is (and don't rename it as
> > deparseAttrsUsed), we can rename the new deparseTargetList as something
> > different may be deparseSelectList. I am fine with that too. But we need
> the
> > later functionality, whatever its name be.
>
> I'm not arguing that we don't need the functionality. I'm arguing
> that if we've got a set of existing functions that are named one way,
> we shouldn't get a whole bunch of new functions that invent an
> entirely new naming convention. I'm not sure exactly how to clean
> this up, but I think we need to find a way.
>
> >> Previously, we had deparseColumnRef(); now we have
> >> both that and deparseJoinVar() doing very similar things. But in each
> >> case, the function names are not parallel and the calling conventions
> >> are totally different. Like here:
> >>
> >> + if (context->foreignrel->reloptkind == RELOPT_JOINREL)
> >> + deparseJoinVar(node, context);
> >> + else
> >> + deparseColumnRef(buf, node->varno,
> >> node->varattno, context->root);
> >>
> >> We pass the buf separately to deparseColumnRef(), but for some reason
> >> not to deparseJoinVar().I wonder if these functions need to be two
> >> separate things or if the work done by deparseJoinVar() should
> >> actually be part of deparseColumnRef(). But even if it needs to be
> >> separate, I wonder why we can't arrange things so that they get the
> >> same arguments, more or less.
> >
> > deparseVar() is called for any Var node that's encountered.
> deparseJoinVar
> > is called to deparse a Var from join relation, which is supposed to
> output
> > INNER or OUTER var's alias as used in INNER or OUTER subqueries. It does
> not
> > output the real column names. deparseColumnRef() however is the same old
> > thing; it deparses column of given base relation. They are not similar
> > things.
>
> deparseColumnRef() emits things like "foo" meaning column foo, or
> "foo.bar" meaning column bar of table foo. deparseJoinVar() emits
> things like "r.a7", referring to a column called "a7" in a relation
> called "r". I feel that those *are* similar things.
>
> I also wonder whether they couldn't be made more similar. It seems to
> me this patch is going to realias things potentially multiple times
> for its own convenience. That's not a catastrophe, but it's not
> great, either, because it produces queries that are not necessarily
> very human readable. It would be nicer to get
> actual_table_name.actual_column_name in more places and r.a7 in fewer.
>
> > I agree that the code is complex for a reader. One of the reasons is
> > recursive nature of deparsing. I will try to make it more cleaner and
> easier
> > to understand. Would adding a call tree for deparsing routines help
> here? Or
> > improving function prologues of even the existing functions?
>
> I don't think so. A README might help, but honestly I think some of
> these APIs really just need to be revised.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_fdw_core_v3.patch application/x-download 17.1 KB
pg_fdw_join_v3.patch application/x-download 145.1 KB
pg_join_pd_v3.patch application/x-download 201.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-01-28 16:28:19 Re: CustomScan under the Gather node?
Previous Message Robert Haas 2016-01-28 16:24:58 Re: HEADSUP: gitmaster.postgresql.org - upgrade NOW