Re: Custom Scan APIs (Re: Custom Plan node)

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, Stephen Frost <sfrost(at)snowman(dot)net>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Custom Scan APIs (Re: Custom Plan node)
Date: 2014-03-06 04:22:03
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8F85B8E@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> writes:
> >> * Please drop the whole register_custom_provider/get_custom_provider
> API.
>
> > One thing I was worrying about is how copyObject() and nodeToString()
> > support set of function pointer tables around custom-scan node,
> > however, you suggested to change the assumption here. So, I think these
> functions become unnecessary.
>
> If we allow the extension to control copyObject behavior, it can do what
> it likes with the function-struct pointer. I think the typical case would
> be that it's a simple pointer to a never-copied static constant. But you
> could imagine that it's a pointer to a struct embedded further down in the
> custom path or plan node, if the extension wants different functions for
> different plans.
>
> If we had to support stringToNode() for paths or plans, things would get
> much more complicated, but we don't (and there are already lots of other
> things that would be difficult for that).
>
I expected to include simple function pointers for copying and text-output
as follows:

typedef struct {
Plan plan;
:
NodeCopy_function node_copy;
NodeTextOut_function node_textout;
} Custom;

Then, sub-routine in copyfuncs.c shall be:
static Custom *
_copyCustom(const Custom *from)
{
return from->node_copy(from);
}

Can I share same image for this? It allows Custom node to have polymorphism
on the node it enhanced.

Sorry, I got little bit confused. Is the function-struct pointer you
mentioned something different from usual function pointer?

> > The reason why CustomScan is derived from Scan is, some of backend
> > code wants to know rtindex of the relation to be referenced by this
> CustomScan.
> > The scanrelid of Scan is used in three points: nodeCustom.c, setrefs.c
> > and explain.c. The usage in nodeCustom.c is just for service routines,
> > and the usage in setrefs.c can be moved to the extension according to
> your suggestion.
> > We need to investigate the usage in explain.c; ExplainPreScanNode()
> > walks around the nodes to collect relids referenced in this query. If
> > we don't want to put a callback for this specific usage, it is a
> > reasonable choice to show the backend the associated scanrelid of
> CustomScan.
>
> I think we have to add another callback for that :-(. It's a pain since
> it's such a trivial point; but the existing code cannot support a custom
> node referencing more than one RTE, which seems possible for join or append
> type cases.
>
It's more generic approach, I like this.
Probably, it can kill the characteristic as Scan of CustomScan from the
view of core backend. It shall perform as an opaque Plan node that may
scan, join, sort or something, so more appropriate its name may be
CustomPlan or simply Custom.

> > Probably, the hunk around show_customscan_info() call can be entirely
> > moved to the extension side. If so, I want ExplainNode() being an
> > extern function, because it allows extensions to print underlying
> plan-nodes.
>
> I haven't looked at what explain.c would have to expose to make this workable,
> but yeah, we will probably have to export a few things.
>
OK,

> > Are you saying the hard-wired portion in setrefs.c can be moved to the
> > extension side? If fix_scan_expr() become extern function, I think it
> > is feasible.
>
> My recollection is that fix_scan_expr() is a bit specialized. Not sure
> exactly what we'd have to export there --- but we'd have to do it anyway.
> What you had in the patch was a hook that could be called, but no way for
> it to do what it would likely need to do.
>
It probably needs to be exported. It walks on the supplied node tree and
eventually calls record_plan_function_dependency() for each functions being
found. It should not be invented in the extension again.
Anyway, my reworking on the patch will make clear which static functions
need to be exposed. Please wait for a while.

> >> * Get rid of the CUSTOM_VAR business, too (including custom_tupdesc).
>
> > In case of Var-node that references joined-relations, it shall be
> > replaced to either INNER_VAR or OUTER_VAR according the location of
> > underlying relations. It eventually makes ExecEvalScalarVar() to
> > reference either ecxt_innertuple or ecxt_outertuple, however, it is
> > problematic if we already consolidated tuples come from the both side
> into one.
>
> So? If you did that, then you wouldn't have renumbered the Vars as
> INNER/OUTER. I don't believe that CUSTOM_VAR is necessary at all; if it
> is needed, then there would also be a need for an additional tuple slot
> in executor contexts, which you haven't provided.
>
> > For example, the enhanced postgres_fdw fetches the result set of
> > remote join query, thus a tuple contains the fields come from both side.
> > In this case, what varno shall be suitable to put?
>
> That would be a scan situation, and the vars could reference the scan tuple
> slot. Which in fact was the implementation you were using, so how is
> CUSTOM_VAR adding anything?
>
Let me sort out the points.
If custom-node performs as join node with two underlying relations thus
it could retrieve two tuples, here is no matter because INNER/OUTER_VAR
can reference individual tuple slot.
Also, custom-node performs as scan node with one underlying relations,
here is also no matter because all the Var nodes in the target-list
references attributes of a particular relation.
An confusing scenarios is that custom-node performs as scan node as an
alternative of built-in join, thus Var-nodes in the target-list may
reference multiple relations, however, it will have only one tupleslot
as like remote-join in postgres_fdw doing.
In this case, we may be able to use existing INNER/OUTER/INDEX var if
we renumber the varattno and use an appropriate slot, instead of adding
a special varno for this.

I'd like to investigate it little more, but I'm inclined to conclude
CUSTOM_VAR might not be necessary, as you suggested.

> >> * Why is there more than one call site for add_scan_path_hook? I
> >> don't see the need for the calling macro with the randomly
> >> inconsistent name, either.
>
> > Where is the best place to do? Even though I cannot imagine the
> > situation to run sub-query or cte by extensions, its path is
> > constructed during set_rel_size(), so I had to put the hook for each
> > set_xxxx_pathlist() functions.
>
> Hm. We could still call the hook in set_rel_pathlist, if we were to get
> rid of the individual calls to set_cheapest and do that in one spot at the
> bottom of set_rel_pathlist (after the hook call). Calling set_cheapest
> in one place seems more consistent anyway.
>
OK, I'll try to move the set_cheapest() to set_rel_pathlist from the
current positions; being distributed to several functions.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2014-03-06 04:24:57 Re: pg_ctl status with nonexistent data directory
Previous Message Noah Misch 2014-03-06 03:56:36 Re: Securing "make check" (CVE-2014-0067)