Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Thom Brown <thom(at)linux(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
Date: 2016-01-25 17:13:54
Message-ID: CA+TgmoZK5Vq9RZ8egO7sBOF2NtvDX8NbAameK4K=KSH94R-TOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 21, 2016 at 8:36 AM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>> What this patch does to the naming and calling conventions in
>> deparse.c does not good. Previously, we had deparseTargetList().
>> Now, we sometimes use that and sometimes deparseAttrsUsed() for almost
>> the same thing.
>
> Previously deparseTargetList deparsed the SELECT or RETURNING clause by
> including list of name of attributes provided by attrs_used. That's now done
> by deparseAttrsUsed(). In current path deparseTargetList() deparses the
> targetlist i.e. list of TargetEntry nodes (right now only Vars). Although
> these two functions return comma separated string of column names, their
> inputs are different. deparseAttrsUsed() can never be called for more than
> one base relation. deparseTargetList() on the other hand can deparse a
> targetlist with Var nodes from multiple relations. We need those two
> functionalities separately. We might convert attrs_used into a list of
> TargetEntry nodes using build_tlist_to_deparse() and use deparseTargetList
> everywhere. A side effect of that would be separating retrieved_attrs
> collection from deparsing code. I didn't do that change in this version to
> avoid large code changes. But I am fine doing that, if that makes code
> readable.
>
> If we have to keep old deparseTargetList as is (and don't rename it as
> deparseAttrsUsed), we can rename the new deparseTargetList as something
> different may be deparseSelectList. I am fine with that too. But we need the
> later functionality, whatever its name be.

I'm not arguing that we don't need the functionality. I'm arguing
that if we've got a set of existing functions that are named one way,
we shouldn't get a whole bunch of new functions that invent an
entirely new naming convention. I'm not sure exactly how to clean
this up, but I think we need to find a way.

>> Previously, we had deparseColumnRef(); now we have
>> both that and deparseJoinVar() doing very similar things. But in each
>> case, the function names are not parallel and the calling conventions
>> are totally different. Like here:
>>
>> + if (context->foreignrel->reloptkind == RELOPT_JOINREL)
>> + deparseJoinVar(node, context);
>> + else
>> + deparseColumnRef(buf, node->varno,
>> node->varattno, context->root);
>>
>> We pass the buf separately to deparseColumnRef(), but for some reason
>> not to deparseJoinVar().I wonder if these functions need to be two
>> separate things or if the work done by deparseJoinVar() should
>> actually be part of deparseColumnRef(). But even if it needs to be
>> separate, I wonder why we can't arrange things so that they get the
>> same arguments, more or less.
>
> deparseVar() is called for any Var node that's encountered. deparseJoinVar
> is called to deparse a Var from join relation, which is supposed to output
> INNER or OUTER var's alias as used in INNER or OUTER subqueries. It does not
> output the real column names. deparseColumnRef() however is the same old
> thing; it deparses column of given base relation. They are not similar
> things.

deparseColumnRef() emits things like "foo" meaning column foo, or
"foo.bar" meaning column bar of table foo. deparseJoinVar() emits
things like "r.a7", referring to a column called "a7" in a relation
called "r". I feel that those *are* similar things.

I also wonder whether they couldn't be made more similar. It seems to
me this patch is going to realias things potentially multiple times
for its own convenience. That's not a catastrophe, but it's not
great, either, because it produces queries that are not necessarily
very human readable. It would be nicer to get
actual_table_name.actual_column_name in more places and r.a7 in fewer.

> I agree that the code is complex for a reader. One of the reasons is
> recursive nature of deparsing. I will try to make it more cleaner and easier
> to understand. Would adding a call tree for deparsing routines help here? Or
> improving function prologues of even the existing functions?

I don't think so. A README might help, but honestly I think some of
these APIs really just need to be revised.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-01-25 17:39:29 Re: Set search_path + server-prepared statements = cached plan must not change result type
Previous Message Alvaro Herrera 2016-01-25 16:36:04 Re: 2016-01 Commitfest