Re: Push down more full joins in postgres_fdw

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Push down more full joins in postgres_fdw
Date: 2016-09-08 10:51:00
Message-ID: 3e5989a9-c322-cc0d-1476-cd0707675d10@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016/09/06 22:07, Ashutosh Bapat wrote:
> On Fri, Sep 2, 2016 at 3:55 PM, Etsuro Fujita
> <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp <mailto:fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>> wrote:

> On 2016/08/22 15:49, Ashutosh Bapat wrote:
> 1. deparsePlaceHolderVar looks odd - each of the deparse*
> function is
> named as deparse + <name of the parser node the string would parse
> into>. PlaceHolderVar is not a parser node, so no string is
> going to be
> parsed as a PlaceHolderVar. May be you want to name it as
> deparseExerFromPlaceholderVar or something like that.

> The name "deparseExerFromPlaceholderVar" seems long to me. How
> about "deparsePlaceHolderExpr"?

> There isn't any node with name PlaceHolderExpr.

I'll rename it to "deparseExerInPlaceholderVar", then.

> 3. I think registerAlias stuff should happen really at the time of
> creating paths and should be stored in fpinfo. Without that it
> will be
> computed every time we deparse the query. This means every time
> we try
> to EXPLAIN the query at various levels in the join tree and once
> for the
> query to be sent to the foreign server.

> Hmm. I think the overhead in calculating aliases would be
> negligible compared to the overhead in explaining each remote query
> for costing or sending the remote query for execution. So, I
> created aliases in the same way as remote params created and stored
> into params_list in deparse_expr_cxt. I'm not sure it's worth
> complicating the code.

> We defer remote parameter creation till deparse time since the the
> parameter numbers are dependent upon the sequence in which we deparse
> the query. Creating them at the time of path creation and storing them
> in fpinfo is not possible because a. those present in the joining
> relations will conflict with each other and need some kind of
> serialization at the time of deparsing b. those defer for differently
> parameterized paths or paths with different pathkeys. We don't defer it
> because it's very light on performance.
>
> That's not true with the alias information. As long as we detect which
> relations need subqueries, their RTIs are enough to create unique
> aliases e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery
> can have alias r123 without conflicting with any other alias.

Hmm. But another concern about the approach of generating an subselect
alias for a path, if needed, at the path creation time would be that the
path might be rejected by add_path, which would result in totally
wasting cycles for generating the subselect alias for the path.

> However minimum overhead it might have, we will deparse the query every
> time we create a path involving that kind of relation i.e. for different
> pathkeys, different parameterization and different joins in which that
> relation participates in. This number can be significant. We want to
> avoid this overhead if we can.

Exactly. As I said above, I think the overhead would be negligible
compared to the overhead in explaining each remote query for costing or
the overhead in sending the final remote query for execution, though.

> 5. The blocks related to inner and outer relations in
> deparseFromExprForRel() look same. We should probably separate
> that code
> out into a function and call it at two places.

> Done.

> I see you have created function deparseOperandRelation() for the
> same. I guess, this should be renamed as deparseRangeTblRef() since it
> will create RangeTblRef node when parsed back.

OK, if there no opinions of others, I'll rename it to the name proposed
by you, "deparseRangeTblRef".

> 6.
> ! if (is_placeholder)
> ! errcontext("placeholder expression at position %d in
> select list",
> ! errpos->cur_attno);
> A user wouldn't know what a placeholder expression is, there is
> no such
> term in the documentation. We have to device a better way to
> provide an
> error context for an expression in general.

> Though I proposed that, I don't think that it's that important to
> let users know that the expression is from a PHV. How about just
> saying "expression", not "placeholder expression", so that we have
> the message "expression at position %d in select list" in the context?

> Hmm, that's better than placeholder expression, but not as explanatory
> as it should be since we won't be printing the "select list" in the
> error context and user won't have a clue as to what error context is
> about.

I don't think so. Consider an example of the conversion error message,
which is from the regression test:

SELECT ft1.c1, ft2.c2, ft1 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND
ft1.c1 = 1;
ERROR: invalid input syntax for integer: "foo"
CONTEXT: whole-row reference to foreign table "ft1"

As shown in the example, the error message is displayed under a remote
query for execution. So, ISTM it's reasonable to print something like
"expression at position %d in select list" in the context if an
expression in a PHV.

> But I don't have any good suggestions right now. May be we should
> print the whole expression? But that can be very long, I don't know.

ISTM that it's a bit too expensive to print the whole expression in the
error context.

> This patch tries to do two things at a time 1. support join pushdown for
> FULL join when the joining relations have remote conditions 2. better
> support for fetching placeholder vars, whole row references and some
> system columns. To make reviews easy, I think we should split the patch
> into two 1. supporting subqueries to be deparsed with support for one of
> the above (I will suggest FULL join support) 2. support for the other.

OK, will try.

Best regards,
Etsuro Fujita

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2016-09-08 10:55:03 Re: Push down more UPDATEs/DELETEs in postgres_fdw
Previous Message Simon Riggs 2016-09-08 10:18:16 Re: Bug in two-phase transaction recovery