Re: [v9.5] Custom Plan API

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: [v9.5] Custom Plan API
Date: 2014-08-22 15:39:53
Message-ID: CA+TgmoZ2NgqBJ=ZeZSBefKBqetyAP7m3xsEZrDWun0KBoeMWNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 17, 2014 at 3:38 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
>> I haven't followed this at all, but I just skimmed over it and noticed
>> the CustomPlanMarkPos thingy; apologies if this has been discussed
>> before. It seems a bit odd to me; why isn't it sufficient to have a
>> boolean flag in regular CustomPlan to indicate that it supports
>> mark/restore?
>
> Yeah, I thought that was pretty bogus too, but it's well down the
> list of issues that were there last time I looked at this ...

I think the threshold question for this incarnation of the patch is
whether we're happy with new DDL (viz, CREATE CUSTOM PLAN PROVIDER) as
a way of installing new plan providers into the database. If we are,
then I can go ahead and enumerate a long list of things that will need
to be fixed to make that code acceptable (such as adding pg_dump
support). But if we're not, there's no point in spending any time on
that part of the patch.

I can see a couple of good reasons to think that this approach might
be reasonable:

- In some ways, a custom plan provider (really, at this point, a
custom scan provider) is very similar to a foreign data wrapper. To
the guts of PostgreSQL, an FDW is a sort of black box that knows how
to scan some data not managed by PostgreSQL. A custom plan provider
is similar, except that it scans data that *is* managed by PostgreSQL.

- There's also some passing resemblance between a custom plan provider
and an access method. Access methods provide a set of APIs for fast
access to data via an index, while custom plan providers provide an
API for fast access to data via magic that the core system doesn't
(and need not) understand. While access methods don't have associated
SQL syntax, they do include catalog structure, so perhaps this should
too, by analogy.

All that having been said, I'm having a hard time mustering up
enthusiasm for this way of doing things. As currently constituted,
the pg_custom_plan_provider catalog contains only a name, a "class"
that is always 's' for scan, and a handler function OID. Quite
frankly, that's a whole lot of nothing. If we got rid of the
pg_catalog structure and just had something like
RegisterCustomPlanProvider(char *name, void (*)(customScanArg *),
which could be invoked from _PG_init(), hundreds and hundreds of lines
of code could go away and we wouldn't lose any actual functionality;
you'd just list your custom plan providers in shared_preload_libraries
or local_preload_libraries instead of listing them in a system
catalog. In fact, you might even have more functionality, because you
could load providers into particular sessions rather than system-wide,
which isn't possible with this design.

I think the underlying issue here really has to do with when custom
plan providers get invoked - what triggers that? For foreign data
wrappers, we have some relations that are plain tables (relkind = 'r')
and no foreign data wrapper code is invoked. We have others that are
flagged as foreign tables (relkind = 'f') and for those we look up the
matching FDW (via ftserver) and run the code. Similarly, for an index
AM, we notice that the relation is an index (relkind = 'r') and then
consult relam to figure out which index AM we should invoke. But as
KaiGai is conceiving this feature, it's quite different. Rather than
applying only to particular relations, and being mutually exclusive
with other options that might apply to those relations, it applies to
*everything* in the database in addition to whatever other options may
be present. The included ctidscan implementation is just as good an
example as PG-Strom: you inspect the query and see, based on the
operators present, whether there's any hope of accelerating things.
In other words, there's no user configuration - and also, not
irrelevantly, no persistent on-disk state the way you have for an
index, or even an FDW, which has on disk state to the extent that
there have to be catalog entries tying a particular FDW to a
particular table.

A lot of the previous discussion of this topic revolves around the
question of whether we can unify the use case that this patch is
targeting with other things - e.g. Citus's desire to store its data
files within the data directory while retaining control over data
access, thus not a perfect fit for FDWs; the desire to push joins down
to foreign servers; more generally, the desire to replace a join with
a custom plan that may or may not use access paths for the underlying
relations as subpaths. I confess I'm not seeing a whole lot of
commonality with anything other than the custom-join-path idea, which
probably shares many of what I believe to be the relevant
characteristics of a custom scan as conceived by KaiGai: namely, that
all of the decisions about whether to inject a custom path in
particular circumstances are left up to the provider itself based on
inspection of the specific query, rather than being a result of user
configuration.

So, I'm tentatively in favor of stripping the DDL support out of this
patch and otherwise trying to boil it down to something that's really
minimal, but I'd like to hear what others think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-08-22 15:59:09 Re: [PATCH] Incremental backup: add backup profile to base backup
Previous Message Andrew Gierth 2014-08-22 15:34:11 Re: WIP Patch for GROUPING SETS phase 1