Re: Foreign join pushdown vs EvalPlanQual

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: Foreign join pushdown vs EvalPlanQual
Date: 2015-10-21 04:34:34
Message-ID: 9A28C8860F777E439AA12E8AEA7694F80115B317@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: Etsuro Fujita [mailto:fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp]
> Sent: Wednesday, October 21, 2015 12:31 PM
> To: Robert Haas
> Cc: Tom Lane; Kaigai Kouhei(海外 浩平); Kyotaro HORIGUCHI;
> pgsql-hackers(at)postgresql(dot)org; Shigeru Hanada
> Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
>
> On 2015/10/20 13:11, Etsuro Fujita wrote:
> > On 2015/10/20 5:34, Robert Haas wrote:
> >> On Mon, Oct 19, 2015 at 3:45 AM, Etsuro Fujita
> >> <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> >>> As Tom mentioned, just recomputing the original join tuple is not good
> >>> enough. We would need to rejoin the test tuples for the baserels
> >>> even if
> >>> ROW_MARK_COPY is in use. Consider:
> >>>
> >>> A=# BEGIN;
> >>> A=# UPDATE t SET a = a + 1 WHERE b = 1;
> >>> B=# SELECT * from t, ft1, ft2
> >>> WHERE t.a = ft1.a AND t.b = ft2.b AND ft1.c = ft2.c FOR UPDATE;
> >>> A=# COMMIT;
> >>>
> >>> where the plan for the SELECT FOR UPDATE is
> >>>
> >>> LockRows
> >>> -> Nested Loop
> >>> -> Seq Scan on t
> >>> -> Foreign Scan on <ft1, ft2>
> >>> Remote SQL: SELECT * FROM ft1 JOIN ft2 WHERE ft1.c = ft2.c
> >>> AND ft1.a
> >>> = $1 AND ft2.b = $2
> >>>
> >>> If an EPQ recheck is invoked by the A's UPDATE, just recomputing the
> >>> original join tuple from the whole-row image that you proposed would
> >>> output
> >>> an incorrect result in the EQP recheck since the value a in the updated
> >>> version of a to-be-joined tuple in t would no longer match the value
> >>> ft1.a
> >>> extracted from the whole-row image if the A's UPDATE has committed
> >>> successfully. So I think we would need to rejoin the tuples
> >>> populated from
> >>> the whole-row images for the baserels ft1 and ft2, by executing the
> >>> secondary plan with the new parameter values for a and b.
>
> >> No. You just need to populate fdw_recheck_quals correctly, same as
> >> for the scan case.
>
> > Yeah, I think we can probably do that for the case where a pushed-down
> > join clause is an inner-join one, but I'm not sure that we can do that
> > for the case where that clause is an outer-join one. Maybe I'm missing
> > something, though.
>
> As I said yesterday, that opinion of me is completely wrong. Sorry for
> the incorrectness. Let me explain a little bit more. I still think
> that even if ROW_MARK_COPY is in use, we would need to locally rejoin
> the tuples populated from the whole-row images for the foreign tables
> involved in a remote join, using a secondary plan. Consider eg,
>
> SELECT localtab.*, ft2 from localtab, ft1, ft2
> WHERE ft1.x = ft2.x AND ft1.y = localtab.y FOR UPDATE
>
> In this case, since the output of the foreign join would not include any
> ft1 columns, I don't think we could do the same thing as for the scan
> case, even if populating fdw_recheck_quals correctly.
>
As an aside, could you introduce the reason why you think so? It is
significant point in discussion, if we want to reach the consensus.

It looks to me the above introduction mix up the target-list of user
query and the target-list of remote query.
If EPQ mechanism requires joined tuple on ft1 and ft2, FDW driver can
make a remote query as follows:
SELECT ft2, ft1.y, ft1.x, ft2.x FROM ft1.x = ft2.x FOR UPDATE
Thus, fdw_scan_tlist has four target-entries, but later two items are
resjunk=true because ForeignScan node drops these columns by projection
when it returns a tuple to upper node.
On the other hands, the joined-tuple we're talking about in this context
is a tuple prior to projection; formed according to the fdw_scan_tlist.
So, it contains all the necessary information to run scan/join qualifiers
towards the joined-tuple. It is not affected by the target-list of user
query.

Even though I think the approach with joined-tuple reconstruction is
reasonable solution here, it is not a fair reason to introduce disadvantage
of Robert's suggestion.

> And I think we
> would need to rejoin the tuples, using a local join execution plan,
> which would have the parameterization for the to-be-pushed-down clause
> ft1.y = localtab.y. I'm still missing something, though.
>
Also, please don't mix up "what we do" and "how we do".

It is "what we do" to discuss which format of tuples shall be returned
to the core backend from the extension, because it determines the role
of interface. If our consensus is to return a joined-tuple, we need to
design the interface according to the consensus.

On the other hands, it is "how we do" discussion whether we should
enforce all the FDW/CSP extension to have alternative plan, or not.
Once we got a consensus in "what we do" discussion, there are variable
options to solve the requirement by the consensus, however, we cannot
prioritize "how we do" without "what we do".

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2015-10-21 05:49:23 Re: checkpointer continuous flushing
Previous Message Rajeev rastogi 2015-10-21 03:42:36 Re: Dangling Client Backend Process