Re: [v9.5] Custom Plan API

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: [v9.5] Custom Plan API
Date: 2014-08-31 04:54:23
Message-ID: CADyhKSX_NttO-GgJ7k4EtRjf+eUzZG6i=tUOx1kuWmyVFxCkVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2014-08-29 13:33 GMT-04:00 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Wed, Aug 27, 2014 at 6:51 PM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
>>> > I'd like to follow this direction, and start stripping the DDL support.
>>>
>>> ...please make it so.
>>>
>> The attached patch eliminates DDL support.
>>
>> Instead of the new CREATE CUSTOM PLAN PROVIDER statement,
>> it adds an internal function; register_custom_scan_provider
>> that takes custom plan provider name and callback function
>> to add alternative scan path (should have a form of CustomPath)
>> during the query planner is finding out the cheapest path to
>> scan the target relation.
>> Also, documentation stuff is revised according to the latest
>> design.
>> Any other stuff keeps the previous design.
>
> Comments:
>
> 1. There seems to be no reason for custom plan nodes to have MultiExec
> support; I think this as an area where extensibility is extremely
> unlikely to work out. The MultiExec mechanism is really only viable
> between closely-cooperating nodes, like Hash and HashJoin, or
> BitmapIndexScan, BitmapAnd, BitmapOr, and BitmapHeapScan; and arguably
> those things could have been written as a single, more complex node.
> Are we really going to want to support a custom plan that can
> substitute for a Hash or BitmapAnd node? I really doubt that's very
> useful.
>
This intends to allows a particular custom-scan provider to exchange
its internal data when multiple custom-scan node is stacked.
So, it can be considered a facility to implement closely-cooperating nodes;
both of them are managed by same custom-scan provider.
An example is gpu-accelerated version of hash-join that takes underlying
custom-scan node that will returns a hash table with gpu preferable data
structure, but should not be a part of row-by-row interface.
I believe it is valuable for some use cases, even though I couldn't find
a use-case in ctidscan example.

> 2. This patch is still sort of on the fence about whether we're
> implementing custom plans (of any type) or custom scans (thus, of some
> particular relation). I previously recommended that we confine
> ourselves initially to the task of adding custom *scans* and leave the
> question of other kinds of custom plan nodes to a future patch. After
> studying the latest patch, I'm inclined to suggest a slightly revised
> strategy. This patch is really adding THREE kinds of custom objects:
> CustomPlanState, CustomPlan, and CustomPath. CustomPlanState inherits
> from ScanState, so it is not really a generic CustomPlan, but
> specifically a CustomScan; likewise, CustomPlan inherits from Scan,
> and is therefore a CustomScan, not a CustomPlan. But CustomPath is
> different: it's just a Path. Even if we only have the hooks to inject
> CustomPaths that are effectively scans at this point, I think that
> part of the infrastructure could be somewhat generic. Perhaps
> eventually we have CustomPath which can generate either CustomScan or
> CustomJoin which in turn could generate CustomScanState and
> CustomJoinState.
>
Suggestion seems to me reasonable. The reason why CustomPlanState
inheris ScanState and CustomPlan inherits Scan is, just convenience for
implementation of extensions. Some useful internal APIs, like ExecScan(),
takes argument of ScanState, so it was a better strategy to choose
Scan/ScanState instead of the bare Plan/PlanState.
Anyway, I'd like to follow the perspective that looks CustomScan as one
derivative from the CustomPath. It is more flexible.

> For now, I propose that we rename CustomPlan and CustomPlanState to
> CustomScan and CustomScanState, because that's what they are; but that
> we leave CustomPath as-is. For ease of review, I also suggest
> splitting this into a series of three patches: (1) add support for
> CustomPath; (2) add support for CustomScan and CustomScanState; (3)
> ctidscan.
>
OK, I'll do that.

> 3. Is it really a good idea to invoke custom scan providers for RTEs
> of every type? It's pretty hard to imagine that a custom scan
> provider can do anything useful with, say, RTE_VALUES. Maybe an
> accelerated scan of RTE_CTE or RTE_SUBQUERY is practical somehow, but
> even that feels like an awfully big stretch. At least until clear use
> cases emerge, I'd be inclined to restrict this to RTE_RELATION scans
> where rte->relkind != RELKIND_FOREIGN_TABLE; that is, put the logic in
> set_plain_rel_pathlist() rather than set_rel_pathlist().
>
I'd like to agree. Indeed, it's not easy to assume a use case of
custom-logic for non-plain relations.

> (We might even want to consider whether the hook in
> set_plain_rel_pathlist() ought to be allowed to inject a non-custom
> plan; e.g. substitute a scan of relation B for a scan of relation A.
> For example, imagine that B contains all rows from A that satisfy some
> predicate. This could even be useful for foreign tables; e.g.
> substitute a scan of a local copy of a foreign table for a reference
> to that table. But I put all of these ideas in parentheses because
> they're only good ideas to the extent that they don't sidetrack us too
> much.)
>
Hmm... It seems to me we need another infrastructure to take
a substitute scan, because add_path() is called towards a certain
RelOpInfo that is associated with the relation A.
As long as custom-scan provider "internally" redirect a request for
scan of A by substitute scan B (with taking care of all other stuff
like relation locks), I don't think we need to put some other hooks
outside from the set_plain_rel_pathlist().

> 4. Department of minor nitpicks. You've got a random 'xs' in the
> comments for ExecSupportsBackwardScan.
>
Sorry, I didn't type 'ctrl' well when I saved the source code on emacs...

> And, in contrib/ctidscan,
> ctidscan_path_methods, ctidscan_plan_methods, and
> ctidscan_exec_methods can have static initializers; there's no need to
> initialize them at run time in _PG_init().
>
It came from the discussion I had long time before during patch
reviewing of postgres_fdw. I suggested to use static table of
FdwRoutine but I got a point that says some compiler raise
error/warning to put function pointers on static initialization.
I usually use GCC only, so I'm not sure whether this argue is
right or not, even though the postgres_fdw_handler() allocates
FdwRoutine using palloc() then put function pointers for each.

Anyway, I'll start to revise the patch according to the comments
2, 3 and first half of 4. Also, I'd like to see the comments regarding
to the 1 and later half of 4.

Thanks,
--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2014-08-31 06:44:01 Re: postgresql latency & bgwriter not doing its job
Previous Message Dobes Vandermeer 2014-08-31 04:40:03 Tips/advice for implementing integrated RESTful HTTP API