Re: Custom Scans and private data

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Kohei KaiGai" <kaigai(at)kaigai(dot)gr(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Custom Scans and private data
Date: 2015-08-26 23:55:16
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8011389A0@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 2015-08-26 00:55:48 +0000, Kouhei Kaigai wrote:
> > As Tom pointed out, the primary reason why CustomScan required provider
> > to save its private data on custom_exprs/custom_private were awareness
> > of copyObject().
>
> Well, a callback brings that with it as well. I do think it makes sense
> to *allow* not to have a callback and rely on copyObject() to do the
> work.
>
> > In addition, custom_exprs are expected to link expression node to be
> > processed in setrefs.c and subselect.c because core implementation
> > cannot know which type of data is managed in private.
>
> Right.
>
> > Do you concern about custom_private only?
>
> Yes, pretty much. There's very little chance you can expand the
> expression tree with custom expressions and survive the experience. So
> custom_exprs etc. makes sense.
>
I can understand your concern, however, I'm not certain which is the best
way to do.
Regarding of CustomPath and CustomScanState, we expect developer defines
its own data structure delivered from these types, like:

typedef struct {
CustomPath cpath;
List *host_quals; /* RestrictInfo run on host */
List *dev_quals; /* RestrictInfo run on device */
} GpuScanPath;

On the other hand, CustomScan node prohibit to have these manner because
of copyObject() awareness. If we can have similar design, it looks to me
harmonious design.

Here is my random thought to realize it, even though current specification
require CustomScan to follow existing copyObject() manner.

- _copyCustomScan() needs to allow to allocate larger than sizeof(CustomScan),
according to the requirement by custom-scan provider.
- We may need to have a utility function to copy the common part.
It is not preferable to implement by custom-scan provider itself.
- For upcoming readfuncs.c support, I don't want to have READ_XXX_FIELD()
macro on the extension side. Even though pg_strtok() is a public function,
_readBitmapset() is static function.
- Is custom_private deprecated? Also, do we force to have CopyObjectCustomScan()
and potentially TextReadCustomScan()?
I want to keep custom_private as-is, because of very simple custom-scan
provider which has at most several primitive values as private one.

> > Even if we have extra
> > callbacks like CopyObjectCustomScan() and TextReadCustomScan(),
> > how do we care about the situation when core implementation needs to
> > know the location of expression nodes? Is custom_exprs retained as is?
>
> Yes.
>
> > In the earlier version of CustomScan interface had individual
> > callbacks on setrefs.c and subselect.c, however, its structure
> > depended on the current implementation too much, then we moved
> > to the implementation to have two individual private fields
> > rather than callbacks on outfuncs.c.
>
> I agree with that choice.
>
> > On the other hands, I'm inclined to think TextOutCustomScan() might
> > be a misdesign to support serialize/deserialize via readfuncs.c.
> >
> http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F80111D04
> F(at)BPXM15GP(dot)gisp(dot)nec(dot)co(dot)jp
> > I think it shall be deprecated rather then enhancement.
>
> Well, right now there's no support for reading/writing plans at all. But
> if we add it, TextOutCustomScan() seems like a small problem in
> comparison to others. CustomScan contains pointers, that's definitely
> not something you can ship over the wire and expect to work. We'll
> probably have to store a soname + function name instead.
>
It may not be a long future. The ParallelAppend feature, I've discussed
in another thread, needs capability of plan tree serialization, and under
the development:

https://github.com/kaigai/sepgsql/blob/fappend/src/backend/nodes/readfuncs.c#L1506

I thought we may need to define a new DDL to associate a particular custom-
scan name with a set of callback pointer table, using persistent system
catalog. Your idea, soname + symbol name, may be a solution.

> More generally I rather doubt that it'll always make sense to
> serialize/deserialize in a generic manner between different backends. It
> very well can make sense to refer to backend-local state in a plan - you
> need to be able to take that into account upon
> serialization/deserialization. Consider e.g. a connection id for an FDW
> that uses pooling.
>
+1. I also prefer more generic mechanism to serialize/deserialize, rather
than custom-scan specific.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-08-27 00:02:41 Re: Our trial to TPC-DS but optimizer made unreasonable plan
Previous Message Peter Geoghegan 2015-08-26 23:31:00 Re: Our trial to TPC-DS but optimizer made unreasonable plan