Re: ctidscan as an example of custom-scan (Re: [v9.5] Custom Plan API)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, hlinnaka <hlinnaka(at)iki(dot)fi>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: ctidscan as an example of custom-scan (Re: [v9.5] Custom Plan API)
Date: 2015-07-15 16:19:59
Message-ID: CA+TgmoYesY7uekDM44omqB1_pDQvQruXrMQbKAePDUxbS1XiWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 14, 2015 at 6:04 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Both you and Andres have articulated the concern that CustomScan isn't
>> actually useful, but I still don't really understand why not.
>
>> I'm curious, for example, whether CustomScan would have been
>> sufficient to build TABLESAMPLE, and if not, why not. Obviously the
>> syntax has to be in core,
>
> ... so you just made the point ...

If you're going to set the bar that high, there's no point in anyone
ever trying to add extensibility anywhere, because there will always
be some other place where there isn't enough extensibility someplace
else to do everything that someone might want. The fact that the
parser isn't extensible doesn't make custom scans useless any more
than the fact that the non-extensibility of WAL-logging makes pg_am
useless.

>> but why couldn't the syntax just call an
>> extension-provided callback that returns a custom scan, instead of
>> having a node just for TABLESAMPLE?
>
> Because that only works for small values of "work". As far as TABLESAMPLE
> goes, I intend to hold Simon's feet to the fire until there's a less
> cheesy answer to the problem of scan reproducibility. Assuming we're
> going to allow sample methods that can't meet the reproducibility
> requirement, we need something to prevent them from producing visibly
> broken query results. Ideally, the planner would avoid putting such a
> scan on the inside of a nestloop. A CustomScan-based implementation could
> not possibly arrange such a thing; we'd have to teach the core planner
> about the concern.

Well, I think it would be better to change the tablesample methods as
you previously proposed, so that they are actually stable. But if
that's not possible, then when you (or someone) makes the core planner
able to consider such matters, you could add a CUSTOM_PATH_UNSTABLE
flag that a custom path can use to notify the core system about the
problem. Then tablesample methods that are unstable can set the flag
and those that are stable are not penalized. Presumably we'll end up
with such a flag in the tablesample-path anyway, and a separate flag
in every kind of future scan that needs similar consideration. If we
put it in some piece of infrastructure (like the custom-path stuff)
that is reusable, we could avoid reinventing the wheel every time.

> Or, taking the example of a GpuScan node, it's essentially impossible
> to persuade the planner to delegate any expensive function calculations,
> aggregates, etc to such a node; much less teach it that that way is cheaper
> than doing such things the usual way. So yeah, KaiGai-san may have a
> module that does a few things with a GPU, but it's far from doing all or
> even very much of what one would want.

It's true that there are things he can't do this way, but without the
custom-scan stuff, he'd be able to do even less.

> Now, as part of the upper-planner-rewrite business that I keep hoping to
> get to when I'm not riding herd on bad patches, it's likely that we might
> have enough new infrastructure soon that that particular problem could
> be solved. But there would just be another problem after that; a likely
> example is not having adequate statistics, or sufficiently fine-grained
> function cost estimates, to be able to make valid choices once there's
> more than one way to do such calculations. (I'm not really impressed by
> "the GPU is *always* faster" approaches.) Significant improvements of
> that sort are going to take core-code changes.

Probably, but giving people the ability to experiment outside core,
even if it's limited, often leads to good things. And when it
doesn't, it doesn't really cost us anything.

> Even worse, if there do get to be any popular custom-scan extensions,
> we'll break them anytime we make any nontrivial planner changes, because
> there is no arms-length API there. A trivial example is that even adding
> or changing any fields in struct Path will necessarily break custom scan
> providers, because they build Paths for themselves with no interposed API.

I agree that will happen, but I don't care. This happens all the
time, and every person other than yourself who has commented on this
issue has said that they'd rather have an API exposed that will later
get whacked around without warning than have no API exposed at all,
not just with respect to custom-paths, but with a whole variety of
other things as well, like sticking PGDLLIMPORT on all of the
variables that back GUCs. It is really far more tedious to try to
work around the lack of a PGDLLIMPORT marking (which causes a huge
annoyance now) than to adjust the code if and when somebody changes
something about the GUC (which causes a small annoyance later, and
only if there actually are changes).

> In large part this is the same as my core concern about the TABLESAMPLE
> patch: exposing dubiously-designed APIs is soon going to force us to make
> choices between breaking those APIs or not being able to make changes we
> need to make. In the case of custom scans, I will not be particularly
> sad when (not if) we break custom scan providers; but in other cases such
> tradeoffs are going to be harder to make.

I'll vote for meaningful forward progress over refusing to change the
API in every single case. I don't think I'll be remotely alone.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-07-15 16:29:04 Re: ctidscan as an example of custom-scan (Re: [v9.5] Custom Plan API)
Previous Message Simon Riggs 2015-07-15 15:52:58 Re: Implementation of global temporary tables?