Re: Push down more full joins in postgres_fdw

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Push down more full joins in postgres_fdw
Date: 2016-09-13 05:17:01
Message-ID: CAFjFpRdyEcLkwHTzTd23SdWgNNQWqxFJaqPHqoeXjro2DX7p0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> 3. I think registerAlias stuff should happen really at the time of
>> creating paths and should be stored in fpinfo. Without that it
>> will be
>> computed every time we deparse the query. This means every time
>> we try
>> to EXPLAIN the query at various levels in the join tree and once
>> for the
>> query to be sent to the foreign server.
>>
>
> Hmm. I think the overhead in calculating aliases would be
>> negligible compared to the overhead in explaining each remote query
>> for costing or sending the remote query for execution. So, I
>> created aliases in the same way as remote params created and stored
>> into params_list in deparse_expr_cxt. I'm not sure it's worth
>> complicating the code.
>>
>
> We defer remote parameter creation till deparse time since the the
>> parameter numbers are dependent upon the sequence in which we deparse
>> the query. Creating them at the time of path creation and storing them
>> in fpinfo is not possible because a. those present in the joining
>> relations will conflict with each other and need some kind of
>> serialization at the time of deparsing b. those defer for differently
>> parameterized paths or paths with different pathkeys. We don't defer it
>> because it's very light on performance.
>>
>> That's not true with the alias information. As long as we detect which
>> relations need subqueries, their RTIs are enough to create unique
>> aliases e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery
>> can have alias r123 without conflicting with any other alias.
>>
>
> Hmm. But another concern about the approach of generating an subselect
> alias for a path, if needed, at the path creation time would be that the
> path might be rejected by add_path, which would result in totally wasting
> cycles for generating the subselect alias for the path.
>

A path may get rejected but the relation is not rejected. The alias applies
to a relation and its targetlist, which will remain same for all paths
created for that relation, esp when it's a base relation or join. So, AFAIU
that work never gets wasted. Also, for costing paths with
use_remote_estimate, we deparse the query, which builds the alias
information again and again for very path. That's much worse than building
it once for a given relation.

>
> However minimum overhead it might have, we will deparse the query every
>> time we create a path involving that kind of relation i.e. for different
>> pathkeys, different parameterization and different joins in which that
>> relation participates in. This number can be significant. We want to
>> avoid this overhead if we can.
>>
>
> Exactly. As I said above, I think the overhead would be negligible
> compared to the overhead in explaining each remote query for costing or the
> overhead in sending the final remote query for execution, though.
>

It won't remain minimal as the number of paths created increases,
increasing the number of times a query is deparsed. We deparse query every
time time we cost a path for a relation with use_remote_estimates true. As
we try to push down more and more stuff, we will create more paths and
deparse the query more time.

Also, that makes the interface better. Right now, in your patch, you have
changed the order of deparsing in the existing code, so that the aliases
are registered while deparsing FROM clause and before any Var nodes are
deparsed. If we create aliases at the time of path creation, only once in
GetForeignJoinPaths or GetForeignPaths as appropriate, that would require
less code churn and would save some CPU cycles as well.

>
> 6.
>> ! if (is_placeholder)
>> ! errcontext("placeholder expression at position %d in
>> select list",
>> ! errpos->cur_attno);
>> A user wouldn't know what a placeholder expression is, there is
>> no such
>> term in the documentation. We have to device a better way to
>> provide an
>> error context for an expression in general.
>>
>
> Though I proposed that, I don't think that it's that important to
>> let users know that the expression is from a PHV. How about just
>> saying "expression", not "placeholder expression", so that we have
>> the message "expression at position %d in select list" in the context?
>>
>
> Hmm, that's better than placeholder expression, but not as explanatory
>> as it should be since we won't be printing the "select list" in the
>> error context and user won't have a clue as to what error context is
>> about.
>>
>
> I don't think so. Consider an example of the conversion error message,
> which is from the regression test:
>
> SELECT ft1.c1, ft2.c2, ft1 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND
> ft1.c1 = 1;
> ERROR: invalid input syntax for integer: "foo"
> CONTEXT: whole-row reference to foreign table "ft1"
>
> As shown in the example, the error message is displayed under a remote
> query for execution. So, ISTM it's reasonable to print something like
> "expression at position %d in select list" in the context if an expression
> in a PHV.
>

I missed it. Sorry. Looks ok.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-09-13 05:27:43 Re: 9.6 TAP tests and extensions
Previous Message Amit Langote 2016-09-13 04:48:57 Re: Partition-wise join for join between (declaratively) partitioned tables