Re: Foreign join pushdown vs EvalPlanQual

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, 花田茂 <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: Foreign join pushdown vs EvalPlanQual
Date: 2015-09-03 00:41:23
Message-ID: CA+TgmoYtONqpWxqNzOuWnyqoBheOMW6gUpET82QWP84MtjjLww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

, On Wed, Aug 26, 2015 at 4:05 AM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
>> On 2015/08/26 16:07, Kouhei Kaigai wrote:
>> I wrote:
>> >> Maybe I'm missing something, but why do we need such a flexiblity for
>> >> the columnar-stores?
>>
>> > Even if we enforce them a new interface specification comfortable to RDBMS,
>> > we cannot guarantee it is also comfortable to other type of FDW drivers.
>>
>> Specifically, what kind of points about the patch are specific to RDBMS?
>>
>
> *** 88,93 **** ForeignRecheck(ForeignScanState *node, TupleTableSlot *slot)
> --- 99,122 ----
> TupleTableSlot *
> ExecForeignScan(ForeignScanState *node)
> {
> + EState *estate = node->ss.ps.state;
> +
> + if (estate->es_epqTuple != NULL)
> + {
> + /*
> + * We are inside an EvalPlanQual recheck. If foreign join, get next
> + * tuple from subplan.
> + */
> + Index scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid;
> +
> + if (scanrelid == 0)
> + {
> + PlanState *outerPlan = outerPlanState(node);
> +
> + return ExecProcNode(outerPlan);
> + }
> + }
> +
> return ExecScan((ScanState *) node,
> (ExecScanAccessMtd) ForeignNext,
> (ExecScanRecheckMtd) ForeignRecheck);
>
> It might not be specific to RDBMS, however, we cannot guarantee all the FDW are
> comfortable to run the alternative plan node on EPQ recheck.
> This design does not allow FDW drivers to implement own EPQ recheck, possibly
> more efficient than built-in logic.

I'm not convinced that this problem is more than hypothetical. EPQ
rechecks should be quite rare, so it shouldn't really matter if we
jump through a few extra hoops when they happen. And really, are
those hoops all that expensive? It's not as if ExecInitNode should be
doing any sort of expensive operation, or ExecEndScan either. And
they will be able to tell if they're being called for an EPQ-recheck
by fishing out the estate, so if there's some processing that they
want to short-circuit for that case, they can. So I'm not seeing the
problem. Do you have any evidence that either the performance cost or
the code complexity cost is significant for PG-Strom or any other
extension?

That having been said, I don't entirely like Fujita-san's patch
either. Much of the new code is called immediately adjacent to an FDW
callback which could pretty trivially do the same thing itself, if
needed. And much of it is contingent on whether estate->es_epqTuple
!= NULL and scanrelid == 0, but perhaps out would be better to check
whether the subplan is actually present instead of checking whether we
think it should be present. Also, the naming is a bit weird:
node->fs_subplan gets shoved into outerPlanState(), which seems like a
kludge.

I'm wondering if there's another approach. If I understand correctly,
there are two reasons why the current situation is untenable. The
first is that ForeignRecheck always returns true, but we could instead
call an FDW-supplied callback routine there. The callback could be
optional, so that we just return true if there is none, which is nice
for already-existing FDWs that then don't need to do anything. The
second is that ExecScanFetch needs scanrelid > 0 so that
estate->es_epqTupleSet[scanrelid - 1] isn't indexing off the beginning
of the array, and similarly estate->es_epqScanDone[scanrelid - 1] and
estate->es_epqTuple[scanrelid - 1]. But, waving my hands wildly, that
also seems like a solvable problem. I mean, we're joining a non-empty
set of relations, so the entries in the EPQ-related arrays for those
RTIs are not getting used for anything, so we can use any of them for
the joinrel. We need some way for this code to decide what RTI to
use, but that shouldn't be too hard to finagle.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-09-03 00:44:05 Re: exposing pg_controldata and pg_config as functions
Previous Message Andres Freund 2015-09-03 00:26:05 Re: Memory prefetching while sequentially fetching from SortTuple array, tuplestore