Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)
Date: 2015-11-17 02:03:04
Message-ID: 9A28C8860F777E439AA12E8AEA7694F80117205C@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Wed, Nov 11, 2015 at 11:13 PM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
> > I agree with we have no reason why only custom-scan is allowed to have
> > serialize/deserialize capability. I can implement an equivalent stuff
> > for foreign-scan also, and it is helpful for extension authors, especially,
> > who tries to implement remote join feature because FDW driver has to
> > keep larger number of private fields than scan.
> >
> > Also, what is the reason why we allow extensions to define a larger
> > structure which contains CustomPath or CustomScanState? It seems to
> > me that these types are not (fully) supported by the current copyfuncs,
> > outfuncs and readfuncs, aren't it?
> > (Although outfuncs.c supports path-nodes, thus CustomPathMethods has
> > TextOut callback but no copy/read handler at this moment.)
>
> I feel like we're sort of plunging blindly down a path of trying to
> make certain parts of the system extensible and pluggable without
> really having a clear vision of where we want to end up. Somehow I
> feel like we ought to start by thinking about a framework for *node*
> extensibility; then, within that, we can talk about what that means
> for particular types of nodes.
>
> For example, suppose do something like this:
>
> typedef struct
> {
> NodeTag tag;
> char *extnodename;
> } ExtensibleNode;
>
> typedef stuct
> {
> char *extnodename;
> char *library_name;
> char *function_name;
> Size node_size;
> ExtensibleNode *(*nodeCopy)(ExtensibleNode *);
> bool (*nodeEqual)(ExtensibleNode *, ExtensibleNode *);
> void (*nodeOut)(StringInfo str, ExtensibleNode *);
> ExtensibleNode (*nodeRead)(void);
> } ExtensibleNodeMethods;
>
> extern void RegisterExtensibleNodeMethods(ExtensibleNodeMethods *);
> extern ExtensibleNodeMethods *GetExtensibleNodeMethods(char *extnodename);
>
> This provides a generic infrastructure so that we don't have to keep
> inventing the same stuff over and over, rather than coming up with one
> way to do it for ForeignScans and another way for CustomScan and
> another way for CustomScanState.
>
Wow! I love the idea.

I guess we will put a pointer to static ExtensibleNodeMethods structure
on ForeignScan, CustomScan, CustomScanState and etc...

Like:
typedef struct ForeignScan
{
Scan scan;
Oid fs_server; /* OID of foreign server */
List *fdw_exprs; /* expressions that FDW may evaluate */
List *fdw_private; /* private data for FDW */
List *fdw_scan_tlist; /* optional tlist describing scan tuple */
List *fdw_recheck_quals; /* original quals not in scan.plan.qual */
Bitmapset *fs_relids; /* RTIs generated by this scan */
bool fsSystemCol; /* true if any "system column" is needed */
+ const ExtensibleNodeMethods *methods;
} ForeignScan;

We may be able to put ExtensibleNodeMethods on the head of CustomScanMethods
structure in case of CustomScan.

Let me write down the steps for deserialization. Please correct me if it is
not what you are thinking.
First, stringToNode picks up a token that shall have node label, like
"SEQSCAN", "CUSTOMSCAN" and others. Once we can identify the following
tokens belong to CustomScan node, it can make advance the pg_strtok pointer
until "methods" field. The method field is consists of a pair of library_name
and symbol_name, then it tries to load the library and resolve the symbol.
Even if library was not loaded on the worker side, this step allows to ensure
the library gets loaded, thus, PG_init() can RegisterExtensibleNodeMethods.
Next, it looks up the table of extensible nodes by a pair of extnodename,
library_name and symbol_name, to get ExtensibleNodeMethods table in this
process address space.
Once it can get nodeRead() callback, we can continue deserialization of
private fields.

I have two concerns on the above proposition.
1) 'node_size' field implies user defined structure to have fixed length.
Extension may want to use variable length structure. The example below
shows GpuJoinState has multiple innerState structure according to the
number of inner relations joined at once.
https://github.com/pg-strom/devel/blob/master/src/gpujoin.c#L240

2) It may be valuable if 'nodeRead(void)' can take an argument of the
knows/common portion of ForeignScan and etc... It allows extension
to adjust number of private fields according to the known part.
For example, I can expect flexible length private fields according
to the length of custom_subplans list.

How about your thought?

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2015-11-17 02:49:52 Re: Parallel Seq Scan
Previous Message Robert Haas 2015-11-17 01:45:56 Re: Freeze avoidance of very large table.