Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)
Date: 2015-11-20 03:11:58
Message-ID: 9A28C8860F777E439AA12E8AEA7694F801173D02@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Mon, Nov 16, 2015 at 9:03 PM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
> > I guess we will put a pointer to static ExtensibleNodeMethods structure
> > on ForeignScan, CustomScan, CustomScanState and etc...
>
> I think that makes it confusing. What I'd prefer to do is have only
> the name stored in the node, and then people who need methods can look
> up those methods based on the name.
>
OK. It is equivalent.

> > Let me write down the steps for deserialization. Please correct me if it is
> > not what you are thinking.
> > First, stringToNode picks up a token that shall have node label, like
> > "SEQSCAN", "CUSTOMSCAN" and others. Once we can identify the following
> > tokens belong to CustomScan node, it can make advance the pg_strtok pointer
> > until "methods" field. The method field is consists of a pair of library_name
> > and symbol_name, then it tries to load the library and resolve the symbol.
>
> No. It reads the "extnodename" field (or whatever we decide to call
> it) and calls GetExtensibleNodeMethods() on that name. That name
> returns the appropriate structure. C libraries that get loaded into
> the server can register their extensible node types at _PG_init()
> time.
>
OK, I got.

> > Even if library was not loaded on the worker side, this step allows to ensure
> > the library gets loaded, thus, PG_init() can RegisterExtensibleNodeMethods.
>
> A parallel worker will load the same loadable modules which the master
> has got loaded. If you use some other kind of background worker, of
> course, you're on your own.
>
Sorry, I oversight it. And SerializeLibraryState() and RestoreLibraryState()
indeed allow background workers even if out of parallel context to restore
the library state easily.

> > I have two concerns on the above proposition.
> > 1) 'node_size' field implies user defined structure to have fixed length.
> > Extension may want to use variable length structure. The example below
> > shows GpuJoinState has multiple innerState structure according to the
> > number of inner relations joined at once.
> > https://github.com/pg-strom/devel/blob/master/src/gpujoin.c#L240
>
> OK, we can replace the node_size field with an allocator callback:
> Node (*nodeAlloc)(void) or something like that.
>
> > 2) It may be valuable if 'nodeRead(void)' can take an argument of the
> > knows/common portion of ForeignScan and etc... It allows extension
> > to adjust number of private fields according to the known part.
> > For example, I can expect flexible length private fields according
> > to the length of custom_subplans list.
>
> I don't understand this bit.
>
The above two points are for the case if and when extension want to use
variable length fields for its private fields.
So, nodeAlloc() callback is not a perfect answer for the use case because
length of the variable length fields shall be (usually) determined by the
value of another fields (like num_inner_rels, num_gpu_devices, ...) thus
we cannot determine the correct length before read.

Let's assume an custom-scan extension that wants to have:

typedef struct {
CustomScan cscan;
int priv_value_1;
long priv_value_2;
extra_info_t extra_subplan_info[FLEXIBLE_ARRAY_MEMBER];
/* its length equal to number of sub-plans */
} ExampleScan;

The "extnodename" within CustomScan allows to pull callback functions
to handle read node from the serialized format.
However, nodeAlloc() callback cannot determine the correct length
(= number of sub-plans in this example) prior to read 'cscan' part.

So, I'd like to suggest _readCustomScan (and other extendable nodes
also) read tokens on local CustomScan variable once, then call
Node *(nodeRead)(Node *local_node)
to allocate entire ExampleScan node and read other private fields.

The local_node is already filled up by the core backend prior to
the callback invocation, so extension can know how many sub-plans
are expected thus how many private tokens shall appear.
It also means extension can know exact length of the ExampleScan
node, so it can allocate the node as expected then fill up
remaining private tokens.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2015-11-20 03:16:23 Re: pglogical_output - a general purpose logical decoding output plugin
Previous Message Etsuro Fujita 2015-11-20 02:00:02 Re: Foreign join pushdown vs EvalPlanQual