Re: [v9.5] Custom Plan API

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: [v9.5] Custom Plan API
Date: 2014-08-23 01:48:20
Message-ID: CADyhKSXcv=8Lv6oBMEW5qf8u5KnuZ3b1uwnrOiE8-KLpd57_vQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2014-08-23 0:39 GMT+09:00 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Thu, Jul 17, 2014 at 3:38 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
>>> I haven't followed this at all, but I just skimmed over it and noticed
>>> the CustomPlanMarkPos thingy; apologies if this has been discussed
>>> before. It seems a bit odd to me; why isn't it sufficient to have a
>>> boolean flag in regular CustomPlan to indicate that it supports
>>> mark/restore?
>>
>> Yeah, I thought that was pretty bogus too, but it's well down the
>> list of issues that were there last time I looked at this ...
>
> I think the threshold question for this incarnation of the patch is
> whether we're happy with new DDL (viz, CREATE CUSTOM PLAN PROVIDER) as
> a way of installing new plan providers into the database. If we are,
> then I can go ahead and enumerate a long list of things that will need
> to be fixed to make that code acceptable (such as adding pg_dump
> support). But if we're not, there's no point in spending any time on
> that part of the patch.
>
Even though I'm patch author, I'd like to agree with this approach.
In fact, the previous custom-plan interface I proposed to the v9.4
development cycle does not include DDL support to register the
custom-plan providers, however, it works fine.

One thing I was pointed out, it is the reason why I implemented
DDL support, is that intermediation of c-language function also
loads the extension module implicitly. It is an advantage, but
not sure whether it shall be supported from the beginning.

> I can see a couple of good reasons to think that this approach might
> be reasonable:
>
> - In some ways, a custom plan provider (really, at this point, a
> custom scan provider) is very similar to a foreign data wrapper. To
> the guts of PostgreSQL, an FDW is a sort of black box that knows how
> to scan some data not managed by PostgreSQL. A custom plan provider
> is similar, except that it scans data that *is* managed by PostgreSQL.
>
> - There's also some passing resemblance between a custom plan provider
> and an access method. Access methods provide a set of APIs for fast
> access to data via an index, while custom plan providers provide an
> API for fast access to data via magic that the core system doesn't
> (and need not) understand. While access methods don't have associated
> SQL syntax, they do include catalog structure, so perhaps this should
> too, by analogy.
>
> All that having been said, I'm having a hard time mustering up
> enthusiasm for this way of doing things. As currently constituted,
> the pg_custom_plan_provider catalog contains only a name, a "class"
> that is always 's' for scan, and a handler function OID. Quite
> frankly, that's a whole lot of nothing. If we got rid of the
> pg_catalog structure and just had something like
> RegisterCustomPlanProvider(char *name, void (*)(customScanArg *),
> which could be invoked from _PG_init(), hundreds and hundreds of lines
> of code could go away and we wouldn't lose any actual functionality;
> you'd just list your custom plan providers in shared_preload_libraries
> or local_preload_libraries instead of listing them in a system
> catalog. In fact, you might even have more functionality, because you
> could load providers into particular sessions rather than system-wide,
> which isn't possible with this design.
>
Indeed. It's an advantage of the approach without system catalog.

> I think the underlying issue here really has to do with when custom
> plan providers get invoked - what triggers that? For foreign data
> wrappers, we have some relations that are plain tables (relkind = 'r')
> and no foreign data wrapper code is invoked. We have others that are
> flagged as foreign tables (relkind = 'f') and for those we look up the
> matching FDW (via ftserver) and run the code. Similarly, for an index
> AM, we notice that the relation is an index (relkind = 'r') and then
> consult relam to figure out which index AM we should invoke. But as
> KaiGai is conceiving this feature, it's quite different. Rather than
> applying only to particular relations, and being mutually exclusive
> with other options that might apply to those relations, it applies to
> *everything* in the database in addition to whatever other options may
> be present. The included ctidscan implementation is just as good an
> example as PG-Strom: you inspect the query and see, based on the
> operators present, whether there's any hope of accelerating things.
> In other words, there's no user configuration - and also, not
> irrelevantly, no persistent on-disk state the way you have for an
> index, or even an FDW, which has on disk state to the extent that
> there have to be catalog entries tying a particular FDW to a
> particular table.
>
Yes, that's my point. In case of FDW or index AM, the query planner
can have some expectations how relevant executor node will handle
the given relation scan according to the persistent state.
However, custom-plan is a black-box to the query planner, and it
cannot have expectation of how relation scan is handled, except for
the cost value estimated by custom-plan provider.
Thus, this interface is designed to invoke custom-plan providers being
registered on relation scan, to ask them whether it can offer alternative
way to scan.

Probably, it may have a shortcut to skip invocation in case when custom-
plan provider obviously cannot provide any alternative plan.
For example, we may add a flag to RegisterCustomPlanProvider() to
tell this custom-plan provider works on only relkind='r', thus no need to
invoke on other relation types.

> A lot of the previous discussion of this topic revolves around the
> question of whether we can unify the use case that this patch is
> targeting with other things - e.g. Citus's desire to store its data
> files within the data directory while retaining control over data
> access, thus not a perfect fit for FDWs; the desire to push joins down
> to foreign servers; more generally, the desire to replace a join with
> a custom plan that may or may not use access paths for the underlying
> relations as subpaths. I confess I'm not seeing a whole lot of
> commonality with anything other than the custom-join-path idea, which
> probably shares many of what I believe to be the relevant
> characteristics of a custom scan as conceived by KaiGai: namely, that
> all of the decisions about whether to inject a custom path in
> particular circumstances are left up to the provider itself based on
> inspection of the specific query, rather than being a result of user
> configuration.
>
> So, I'm tentatively in favor of stripping the DDL support out of this
> patch and otherwise trying to boil it down to something that's really
> minimal, but I'd like to hear what others think.
>
I'd like to follow this direction, and start stripping the DDL support.

Thanks,
--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-08-23 02:04:50 Re: Is this a bug?
Previous Message Tomas Vondra 2014-08-23 01:02:52 Re: 9.5: Better memory accounting, towards memory-bounded HashAgg