Re: Foreign join pushdown vs EvalPlanQual

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, 花田茂 <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: Foreign join pushdown vs EvalPlanQual
Date: 2015-10-01 02:15:29
Message-ID: 9A28C8860F777E439AA12E8AEA7694F80114D442@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Robert Haas
> Sent: Wednesday, September 30, 2015 6:55 AM
> To: Etsuro Fujita
> Cc: Kaigai Kouhei(海外 浩平); PostgreSQL-development; 花田茂
> Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
>
> On Mon, Sep 28, 2015 at 11:15 PM, Etsuro Fujita
> <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > I thought the same thing [1]. While I thought it was relatively easy to
> > make changes to RefetchForeignRow that way for the foreign table case
> > (scanrelid>0), I was not sure how hard it would be to do so for the foreign
> > join case (scanrelid==0). So, I proposed to leave that changes for 9.6.
> > I'll have a rethink on this issue along the lines of that approach.
>
> Well, I spent some more time looking at this today, and testing it out
> using a fixed-up version of your foreign_join_v16 patch, and I decided
> that RefetchForeignRow is basically a red herring. That's only used
> for FDWs that do late row locking, but postgres_fdw (and probably many
> others) do early row locking, in which case RefetchForeignRow never
> gets called. Instead, the row is treated as a "non-locked source row"
> by ExecLockRows (even though it is in fact locked) and is re-fetched
> by EvalPlanQualFetchRowMarks. We should probably update the comment
> about non-locked source rows to mention the case of FDWs that do early
> row locking.
>
Indeed, select_rowmark_type() says ROW_MARK_COPY if GetForeignRowMarkType
callback is not defined.

> Anyway, everything appears to work OK up to this point: we correctly
> retrieve the saved whole-rows from the foreign side and call
> EvalPlanQualSetTuple on each one, setting es_epqTuple[rti - 1] and
> es_epqTupleSet[rti - 1]. So far, so good. Now we call
> EvalPlanQualNext, and that's where we get into trouble. We've got the
> already-locked tuples from the foreign side and those tuples CANNOT
> have gone away or been modified because we have already locked them.
> So, all the foreign join needs to do is return the same tuple that it
> returned before: the EPQ recheck was triggered by some *other* table
> involved in the plan, not our table. A local table also involved in
> the query, or conceivably a foreign table that does late row locking,
> could have had something change under it after the row was fetched,
> but in postgres_fdw that can't happen because we locked the row up
> front. And thus, again, all we need to do is re-return the same
> tuple. But we don't have that. Instead, the ROW_MARK_COPY logic has
> caused us to preserve a copy of each *baserel* tuple.
>
> Now, this is as sad as can be. Early row locking has huge advantages
> for FDWs, both in terms of minimizing server round trips and also
> because the FDW doesn't really need to do anything about EPQ. Sure,
> it's inefficient to carry around whole-row references, but it makes
> life easy for the FDW author.
>
I got the point. Is it helpful to add description why ROW_MARK_COPY
does not need recheck on both of local/remote tuples?
http://www.postgresql.org/docs/devel/static/fdw-row-locking.html

> So, if we wanted to fix this in a way that preserves the spirit of
> what's there now, it seems to me that we'd want the FDW to return
> something that's like a whole row reference, but represents the output
> of the foreign join rather than some underlying base table. And then
> get the EPQ machinery to have the evaluation of the ForeignScan for
> the join, when it happens in an EPQ context, to return that tuple.
> But I don't really have a good idea how to do that.
>
> More thought seems needed here...
>
Alternative built-in join execution?
Once it is executed under the EPQ context, built-in join node fetches
a tuple from both of inner and outer side for each. It is eventually
fetched from the EPQ slot, then the alternative join produce a result
tuple.
In case when FDW is not designed to handle join by itself, it is
a reasonable fallback I think.

I expect FDW driver needs to handle EPQ recheck in the case below:
* ForeignScan on base relation and it uses late row locking.
* ForeignScan on join relation, even if early locking.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-10-01 02:46:43 Re: [PATCH] postgres_fdw extension support
Previous Message Jim Nasby 2015-10-01 02:04:33 Re: No Issue Tracker - Say it Ain't So!