Re: Foreign join pushdown vs EvalPlanQual

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, 花田茂 <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: Foreign join pushdown vs EvalPlanQual
Date: 2015-08-07 07:37:42
Message-ID: 9A28C8860F777E439AA12E8AEA7694F80112FF52@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I could have a discussion with Fujita-san about this topic.
>
Also, let me share with the discussion towards entire solution.

The primitive reason of this problem is, Scan node with scanrelid==0
represents a relation join that can involve multiple relations, thus,
its TupleDesc of the records will not fit base relations, however,
ExecScanFetch() was not updated when scanrelid==0 gets supported.

FDW/CSP on behalf of the Scan node with scanrelid==0 are responsible
to generate records according to the fdw_/custom_scan_tlist that
reflects the definition of relation join, and only FDW/CSP know how
to combine these base relations.
In addition, host-side expressions (like Plan->qual) are initialized
to reference the records generated by FDW/CSP, so the least invasive
approach is to allow FDW/CSP to have own logic to recheck, I think.

Below is the structure of ExecScanFetch().

ExecScanFetch(ScanState *node,
ExecScanAccessMtd accessMtd,
ExecScanRecheckMtd recheckMtd)
{
EState *estate = node->ps.state;

if (estate->es_epqTuple != NULL)
{
/*
* We are inside an EvalPlanQual recheck. Return the test tuple if
* one is available, after rechecking any access-method-specific
* conditions.
*/
Index scanrelid = ((Scan *) node->ps.plan)->scanrelid;

Assert(scanrelid > 0);
if (estate->es_epqTupleSet[scanrelid - 1])
{
TupleTableSlot *slot = node->ss_ScanTupleSlot;
:
return slot;
}
}
return (*accessMtd) (node);
}

When we are inside of EPQ, it fetches a tuple in es_epqTuple[] array and
checks its visibility (ForeignRecheck() always say 'yep, it is visible'),
then ExecScan() applies its qualifiers by ExecQual().
So, as long as FDW/CSP can return a record that satisfies the TupleDesc
of this relation, made by the tuples in es_epqTuple[] array, rest of the
code paths are common.

I have an idea to solve the problem.
It adds recheckMtd() call if scanrelid==0 just before the assertion above,
and add a callback of FDW on ForeignRecheck().
The role of this new callback is to set up the supplied TupleTableSlot
and check its visibility, but does not define how to do this.
It is arbitrarily by FDW driver, like invocation of alternative plan
consists of only built-in logic.

Invocation of alternative plan is one of the most feasible way to
implement EPQ logic on FDW, so I think FDW also needs a mechanism
that takes child path-nodes like custom_paths in CustomPath node.
Once a valid path node is linked to this list, createplan.c transform
them to relevant plan node, then FDW can initialize and invoke this
plan node during execution, like ForeignRecheck().

This design can solve another problem Fujita-san has also mentioned.
If scan qualifier is pushed-down to the remote query and its expression
node is saved in the private area of ForeignScan, the callback on
ForeignRecheck() can evaluate the qualifier by itself. (Note that only
FDW driver can know where and how expression node being pushed-down
is saved in the private area.)

In the summary, the following three enhancements are a straightforward
way to fix up the problem he reported.
1. Add a special path to call recheckMtd in ExecScanFetch if scanrelid==0
2. Add a callback of FDW in ForeignRecheck() - to construct a record
according to the fdw_scan_tlist definition and to evaluate its
visibility, or to evaluate qualifier pushed-down if base relation.
3. Add List *fdw_paths in ForeignPath like custom_paths of CustomPaths,
to construct plan nodes for EPQ evaluation.

On the other hands, we also need to pay attention the development
timeline. It is a really problem of v9.5, however, it looks to me
the straight forward solution needs enhancement of FDW APIs.

I'd like to see people's comment.
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Kouhei Kaigai
> Sent: Saturday, August 01, 2015 10:35 PM
> To: Robert Haas; Etsuro Fujita
> Cc: PostgreSQL-development; 花田茂
> Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
>
> > On Fri, Jul 3, 2015 at 6:25 AM, Etsuro Fujita
> > <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > > Can't FDWs get the join information through the root, which I think we would
> > > pass to the API as the argument?
> >
> > This is exactly what Tom suggested originally, and it has some appeal,
> > but neither KaiGai nor I could see how to make it work . Do you have
> > an idea? It's not too late to go back and change the API.
> >
> > The problem that was bothering us (or at least what was bothering me)
> > is that the PlannerInfo provides only a list of SpecialJoinInfo
> > structures, which don't directly give you the original join order. In
> > fact, min_righthand and min_lefthand are intended to constraint the
> > *possible* join orders, and are deliberately designed *not* to specify
> > a single join order. If you're sending a query to a remote PostgreSQL
> > node, you don't want to know what all the possible join orders are;
> > it's the remote side's job to plan the query. You do, however, need
> > an easy way to identify one join order that you can use to construct a
> > query. It didn't seem easy to do that without duplicating
> > make_join_rel(), which seemed like a bad idea.
> >
> > But maybe there's a good way to do it. Tom wasn't crazy about this
> > hook both because of the frequency of calls and also because of the
> > long argument list. I think those concerns are legitimate; I just
> > couldn't see how to make the other way work.
> >
> I could have a discussion with Fujita-san about this topic.
> He has a little bit tricky, but I didn't have a clear reason to deny,
> idea to tackle this matter.
> At the line just above set_cheapest() of the standard_join_search(),
> at least one built-in join logic are already added to the RelOptInfo,
> thus, FDW driver can reference the cheapest path by built-in logic
> and its lefttree and righttree that construct a joinrel.
> Its assumption is, the best paths by built-in logic are at least
> enough reasonable join order than other potential ones.
>
> Thanks,
> --
> NEC Business Creation Division / PG-Strom Project
> KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Seltenreich 2015-08-07 07:47:28 [sqlsmith] ERROR: too late to create a new PlaceHolderInfo
Previous Message Michael Paquier 2015-08-07 07:22:53 Re: WIP: SCRAM authentication