Re: Foreign join pushdown vs EvalPlanQual

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, 花田茂 <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: Foreign join pushdown vs EvalPlanQual
Date: 2015-09-03 13:15:23
Message-ID: 9A28C8860F777E439AA12E8AEA7694F80113C052@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> , On Wed, Aug 26, 2015 at 4:05 AM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
> >> On 2015/08/26 16:07, Kouhei Kaigai wrote:
> >> I wrote:
> >> >> Maybe I'm missing something, but why do we need such a flexiblity for
> >> >> the columnar-stores?
> >>
> >> > Even if we enforce them a new interface specification comfortable to RDBMS,
> >> > we cannot guarantee it is also comfortable to other type of FDW drivers.
> >>
> >> Specifically, what kind of points about the patch are specific to RDBMS?
> >>
> >
> > *** 88,93 **** ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot)
> > --- 99,122 ----
> > TupleTableSlot *
> > ExecForeignScan(ForeignScanState *node)
> > {
> > + EState *estate = node->ss.ps.state;
> > +
> > + if (estate->es_epqTuple != NULL)
> > + {
> > + /*
> > + * We are inside an EvalPlanQual recheck. If foreign join,
> get next
> > + * tuple from subplan.
> > + */
> > + Index scanrelid = ((Scan *)
> node->ss.ps.plan)->scanrelid;
> > +
> > + if (scanrelid == 0)
> > + {
> > + PlanState *outerPlan = outerPlanState(node);
> > +
> > + return ExecProcNode(outerPlan);
> > + }
> > + }
> > +
> > return ExecScan((ScanState *) node,
> > (ExecScanAccessMtd) ForeignNext,
> > (ExecScanRecheckMtd)
> ForeignRecheck);
> >
> > It might not be specific to RDBMS, however, we cannot guarantee all the FDW
> are
> > comfortable to run the alternative plan node on EPQ recheck.
> > This design does not allow FDW drivers to implement own EPQ recheck, possibly
> > more efficient than built-in logic.
>
> I'm not convinced that this problem is more than hypothetical. EPQ
> rechecks should be quite rare, so it shouldn't really matter if we
> jump through a few extra hoops when they happen. And really, are
> those hoops all that expensive? It's not as if ExecInitNode should be
> doing any sort of expensive operation, or ExecEndScan either. And
> they will be able to tell if they're being called for an EPQ-recheck
> by fishing out the estate, so if there's some processing that they
> want to short-circuit for that case, they can. So I'm not seeing the
> problem. Do you have any evidence that either the performance cost or
> the code complexity cost is significant for PG-Strom or any other
> extension?
>
Even though PG-Strom does not implement EPQ recheck mechanism yet
(and not implemented on top of FDW), I plan to re-use CPU fallback
mechanism (*1) rather than having alternative plan approach.
I also don't care about performance penalty, however, don't want to
have alternative plan because of code complexity.
I don't deny individual extensions have alternative path by their
decision, but should not be enforced.

(*1) GPU often cannot execute expression because of exceptional
data like very long numeric or external toast etc..., but to be
executable. In this case, PG-Strom evaluates this expression in
the CPU side (of course, it is worse than normal execution path
but better than error). This logic is almost same as what we need
on EPQ recheck.

> I'm wondering if there's another approach. If I understand correctly,
> there are two reasons why the current situation is untenable. The
> first is that ForeignRecheck always returns true, but we could instead
> call an FDW-supplied callback routine there. The callback could be
> optional, so that we just return true if there is none, which is nice
> for already-existing FDWs that then don't need to do anything. The
> second is that ExecScanFetch needs scanrelid > 0 so that
> estate->es_epqTupleSet[scanrelid - 1] isn't indexing off the beginning
> of the array, and similarly estate->es_epqScanDone[scanrelid - 1] and
> estate->es_epqTuple[scanrelid - 1]. But, waving my hands wildly, that
> also seems like a solvable problem. I mean, we're joining a non-empty
> set of relations, so the entries in the EPQ-related arrays for those
> RTIs are not getting used for anything, so we can use any of them for
> the joinrel. We need some way for this code to decide what RTI to
> use, but that shouldn't be too hard to finagle.
>
ForeignScan->fs_relids and CustomScan->custom_relids know which RTIs
shall be involved in this joinrel.

However, only extension know how these relations (including the case
of N-way join) shall be joined. FDW drivers may keep joinrestrictinfo
in their comfortable way, like a compiled GPU native binary, so I don't
think core side can do something relevant reasonably.
Even though Fujita-san proposed a new special fields in ForeignScan
to attach expression node that was pushed down, however, it looks to
me interface contract makes more complicated. Rather than various
special purpose fields, it is more straightforward to call back
extension when scanrelid==0. We can provide equivalent feature as
a utility function that has capability Fujita-san wants.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Anastasia Lubennikova 2015-09-03 13:28:33 Re: [PATCH] Microvacuum for gist.
Previous Message Merlin Moncure 2015-09-03 13:13:52 Re: Allow a per-tablespace effective_io_concurrency setting