Re: Foreign join pushdown vs EvalPlanQual

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, 花田茂 <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: Foreign join pushdown vs EvalPlanQual
Date: 2015-10-01 13:17:34
Message-ID: 9A28C8860F777E439AA12E8AEA7694F80114D7BB@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Etsuro Fujita
> Sent: Thursday, October 01, 2015 5:50 PM
> To: Kaigai Kouhei(海外 浩平); Robert Haas
> Cc: PostgreSQL-development; 花田茂
> Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
>
> On 2015/10/01 11:15, Kouhei Kaigai wrote:
> >> From: pgsql-hackers-owner(at)postgresql(dot)org
> >> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Robert Haas
> >> On Mon, Sep 28, 2015 at 11:15 PM, Etsuro Fujita
> >> <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> >>> I thought the same thing [1]. While I thought it was relatively easy to
> >>> make changes to RefetchForeignRow that way for the foreign table case
> >>> (scanrelid>0), I was not sure how hard it would be to do so for the foreign
> >>> join case (scanrelid==0). So, I proposed to leave that changes for 9.6.
> >>> I'll have a rethink on this issue along the lines of that approach.
>
> >> So, if we wanted to fix this in a way that preserves the spirit of
> >> what's there now, it seems to me that we'd want the FDW to return
> >> something that's like a whole row reference, but represents the output
> >> of the foreign join rather than some underlying base table. And then
> >> get the EPQ machinery to have the evaluation of the ForeignScan for
> >> the join, when it happens in an EPQ context, to return that tuple.
> >> But I don't really have a good idea how to do that.
>
> > Alternative built-in join execution?
> > Once it is executed under the EPQ context, built-in join node fetches
> > a tuple from both of inner and outer side for each. It is eventually
> > fetched from the EPQ slot, then the alternative join produce a result
> > tuple.
> > In case when FDW is not designed to handle join by itself, it is
> > a reasonable fallback I think.
> >
> > I expect FDW driver needs to handle EPQ recheck in the case below:
> > * ForeignScan on base relation and it uses late row locking.
> > * ForeignScan on join relation, even if early locking.
>
> I also think the approach would be one choice. But one thing I'm
> concerned about is plan creation for that by the FDW author; that would
> make life hard for the FDW author. (That was proposed by me ...)
>
I don't follow the standpoint, but not valuable to repeat same discussion.

> So, I'd like to investigate another approach that preserves the
> applicability of late row locking to the join pushdown case as well as
> the spirit of what's there now. The basic idea is (1) add a new
> callback routine RefetchForeignJoinRow that refetches one foreign-join
> tuple from the foreign server, after locking remote tuples for the
> component foreign tables if required, and (2) call that routine in
> ExecScanFetch if the target scan is for a foreign join and the component
> foreign tables require to be locked lately, else just return the
> foreign-join tuple stored in the parent's state tree, which is the tuple
> mentioned by Robert, for preserving the spirit of what's there now. I
> think that ExecLockRows and EvalPlanQualFetchRowMarks should probably be
> modified so as to skip foreign tables involved in a foreign join.
>
As long as FDW author can choose their best way to produce a joined
tuple, it may be worth to investigate.

My comments are:
* ForeignRecheck is the best location to call RefetchForeignJoinRow
when scanrelid==0, not ExecScanFetch. Why you try to add special
case for FDW in the common routine.
* It is FDW's choice where the remote join tuple is kept, even though
most of FDW will keep it on the private field of ForeignScanState.

Apart from FDW requirement, custom-scan/join needs recheckMtd is
called when scanrelid==0 to avoid assertion fail. I hope FDW has
symmetric structure, however, not a mandatory requirement for me.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-10-01 13:19:37 Re: Parallel Seq Scan
Previous Message Amit Kapila 2015-10-01 11:52:53 Re: Parallel Seq Scan