Re: Push down more full joins in postgres_fdw

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Push down more full joins in postgres_fdw
Date: 2016-10-25 09:58:07
Message-ID: CAFjFpRcn7=DNOXm-PJ_jVDqAmghKVf6tApQC+_RgMZ8tOPExcA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
>> 13. The comment below is missing the main purpose i.e. creating a a unique
>> alias, in case the relation gets converted into a subquery. Lowest or
>> highest
>> relid will create a unique alias at given level of join and that would be
>> more
>> future proof. If we decide to consider paths for every join order,
>> following
>> comment will no more be true.
>> + /*
>> + * Set the relation index. This is defined as the position of this
>> + * joinrel in the join_rel_list list plus the length of the rtable
>> list.
>> + * Note that since this joinrel is at the end of the list when we are
>> + * called, we can get the position by list_length.
>> + */
>> + fpinfo->relation_index =
>> + list_length(root->parse->rtable) +
>> list_length(root->join_rel_list);
>
>
> I don't agree on that point. As I said before, the approach using the
> lowest/highest relid would make a remote query having nested subqueries
> difficult to read. And even if we decided to consider paths for every join
> order, we could define the relation_index safely, by modifying
> postgresGetForeignJoinPaths so that it (1) sets the relation_index the
> proposed way at the first time it is called and (2) avoids setting the
> relation_index after the first call, by checking to see
> fpinfo->relation_index > 0.

OK. Although, the alias which contains a number, which apparently
doesn't show any relation with the relation (no pun intended :)) being
deparsed, won't make much sense in the deparsed query. But I am fine
leaving this to the committer's judgement. May be you want to add an
assert here making sure that relation_index is set only once.
Something like Assert(fpinfo->relation_index == 0) just before setting
it.

>
>> 14. The function name talks about non-vars but the Assert few lines below
>> asserts that every node there is a Var. Need better naming.
>> +reltarget_has_non_vars(RelOptInfo *foreignrel)
>> +{
>> + ListCell *lc;
>> +
>> + foreach(lc, foreignrel->reltarget->exprs)
>> + {
>> + Var *var = (Var *) lfirst(lc);
>> +
>> + Assert(IsA(var, Var));
>> And also an explanation for the claim
>> + * Note: currently deparseExplicitTargetList can't properly handle such
>> Vars.
>
>
> Will remove this function for the reason described in #12.
>
>> 15. While deparsing every Var, we are descending the RelOptInfo hierarchy.
>
>
> Right. I think that not only avoids creating redundant data to find the
> alias of a given Var node but also would be able to find it in optimal cost.
>
>> Instead of that, we should probably gather all the alias information once
>> and
>> store it in context as an array indexed by relid. While deparsing a Var we
>> can
>> get the targetlist and alias for a given relation by using var->varno as
>> index.
>> And then search for given Var node in the targetlist to deparse the column
>> name
>> by its position in the targetlist. For the relations that are not
>> converted
>> into subqueries, this array will not hold any information and the Var will
>> be
>> deparsed as it's being done today.
>
>
> Sorry, I don't understand this part. How does that work when creating a
> remote query having nested subqueries? Is that able to search for a given
> Var node efficiently?

For every relation that is deparsed as a subquery, we will create a
separate context. Each such context will have an array described
above. This array will contain the targetlist and aliases for all the
relations, covered by that relation, that are required to be deparsed
as subqueries. These arrays bubble up to topmost relation that is not
required to be deparsed as subquery. When a relation is required to be
deparsed as a subquery, its immediate upper relation sets the alias
and targetlist chosen for that relation at all the indexes
corresponding to all the base relations covered by that relation. For
example, let's assume that a relation (1, 2, 3) is required to be
deparsed as subquery for an immediate upper relation (1, 2, 3, 4, 5)
(thus the other joining relation being (4,5)). While deparsing for
relation (1,2,3,4,5), the context will contain a 5 element array, with
positions 1, 2, 3 filled by same targetlist and alias whereas
positions 4 and 5 will not be filled as those relations are not
deparsed as subqueries. Let's assume in relation (1, 2, 3), (1, 3) in
turn requires subquery but (2) does not. Thus the context created
while deparsing (1, 2, 3) will have a 3 element array with positions 1
and 3 containing the same targetlist and alias, where as position 2
will be empty. When deparsing a Var node with varno = N and varattno =
m, if the nth position in the array in the context is empty, that Var
node will be deparsed as rN.<column name>. But if that position is has
alias sZ, then we search for that Var node in the targetlist and if
it's found at kth position in the targetlist, we will deparse it as
sZ.ck. The search in the targetlist can be performed using
tlist_member, and then fetching the position by TargetEntry::resno.
This does not require any recursion and thus saves stack space and
some CPU cycles required for recursion. I guess, the arrays need to be
computed only once for any relation when the query for that relation
is deparsed the first time.

Thanks for considering other suggestions.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2016-10-25 10:12:59 Re: pg_basebackup stream xlog to tar
Previous Message Peter Moser 2016-10-25 09:44:02 Re: [PROPOSAL] Temporal query processing with range types