Re: delta relations in AFTER triggers

From: Kevin Grittner <kgrittn(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: delta relations in AFTER triggers
Date: 2016-11-21 17:04:34
Message-ID: CACjxUsOWAK4-rLWSAsa4FfAUX6Mxa41-TtCVN-D4UysJsod7BA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the review! Will respond further after reviewing your
suggested patches; this is a quick response just to the contents of
the email.

On Mon, Nov 21, 2016 at 1:05 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:

> Both ways have attracted criticism: the first involves touching
> basically every core function that might eventually parse, plan or
> execute a query to make it accept a Tsrcache and pass that on, and the
> second involves a bunch of Rube Goldberg machine-like
> callback/variable/parameter code.

Just to quantify "touching basically every core function that
might...", the number of functions which have a change in signature
(adding one parameter) is seven. No more; no less.

> I spent some time investigating whether a third way would be viable:
> use ParamListInfo's setupParser hook and add an analogous one for the
> executor, so that there is no registry equivalent to Tsrcache, but
> also no dealing with param slots or (in plpgsql's case) new kinds of
> variables. Instead, there would just be two callbacks: one for asking
> the tuplestore provider for metadata (statistics, TupleDesc) at
> planning time, and another for asking for the tuplestore by name at
> execution time. One problem with this approach is that it requires
> using the SPI_*_with_paramlist interfaces, not the more commonly used
> arg-based versions, and requires using a ParamListInfo when you
> otherwise have no reason to because you have no parameters. Also,
> dealing with callbacks to register your tuplestore supplier is a
> little clunky. More seriously, requiring providers of those callbacks
> to write code that directly manipulates ParserState and EState and
> calls addRangeTableEntryXXX seems like a modularity violation --
> especially for PLs that are less integrated with core Postgres code
> than plpgsql. I got this more or less working, but didn't like it
> much and didn't think it would pass muster.

ok

> After going through that experience, I now agree with Kevin: an
> interface where a new SPI interface lets PLs push a named tuplestore
> into the SPI connection to make it available to SQL seems like the
> simplest and tidiest way. I do have some feedback and suggestions
> though:
>
> 1. Naming: Using tuplestores in AfterTriggersData make perfect sense
> to me but I don't think it follows that the exact thing we should
> expose to the executor to allow these to be scanned is a
> TuplestoreScan. We have other nodes that essentially scan a
> tuplestore like WorkTableScan and Material but they have names that
> tell you what they are for. This is for scanning data that has either
> been conjured up or passed on by SPI client code and exposed to SQL
> queries. How about SpiRelationScan, SpiDataScan, SpiRelVarScan, ....?

I think an SPI centered approach is the wrong way to go. I feel
that an SPI *interface* will be very useful, and probably save
thousands of lines of fragile code which would otherwise be
blurring the lines of the layering, but I feel there there should
be a lower-level interface, and the SPI interface should use that
to provide a convenience layer. In particular, I suspect that some
uses of these named tuplestore relations will initially use SPI for
convenience of development, but may later move some of that code to
dealing with parse trees, for performance. Ideally, the execution
plan would be identical whether or not SPI was involved, so naming
implying the involvement of SPI would be misleading.
NamedTuplestoreScan might be an improvement over just
TuplestoreScan.

> Also, Tsrcache is strangely named: it's not exactly a cache, it's
> more of a registry.

When I used the word "cache" here, I was thinking more of this
English language definition:

a : a hiding place especially for concealing and preserving
provisions or implements
b : a secure place of storage

The intent being to emphasize that there is not one public
"registry" of such objects, but context-specific collections where
references are tucked away when they become available for later use
in the only the appropriate context. Eventually, when these are
used for some of the less "eager" timings of materialized view
maintenance, they may be set aside for relatively extended periods
(i.e., minutes or maybe even hours) before being used. Neither
"registry" nor "cache" seems quite right; maybe someone can think
of a word with more accurate semantics.

> 2. Scope of changes: If we're going to touch functions all over the
> source tree to get the Tsrcache where it needs to be for parsing and
> execution, then I wonder if we should consider thinking a bit more
> about where this is going. What if we had a thing called a
> QueryEnvironment, and we passed a pointer to that into to all those
> places, and it could contain the named tuplestore registry?

I agree. I had a version building on the Tsrcache approach which
differentiated between three levels of generality: Ephemeral
Relations, a subclass of that call Named Ephemeral Relations, and a
subclass of that called Named Tuplestore Relations. That
experiment seems to have been lost somewhere along the way, but I
think it was fairly easy to draw those lines in the Tsrcache
version to support other types of lightweight relations. I didn't
have the concept of a QueryEnvironment in that; I would be
interested in hearing more about how you see that working.

> See attached patches:
>
> * spi-relation-v1.patch, which provides: (1) an SpiRelationScan
> executor node, (2) the SPI interface required to feed data to it, (3)
> a new QueryEnvironment struct which is used to convey SpiRelation into
> the right bits of the planner and executor; this patch is based on
> fragments extracted from the -noapi and -tsr patches, and modified as
> described above
>
> * spi-relation-plpgsql-v1.patch, to teach plpgsql how to expose the
> new and old tables to SQL via the above
>
> * spi-relation-plpython-v1.patch, ditto for plpython; this patch makes
> the OLD TABLE and NEW TABLE automatically available to any query you
> run via plpy.execute, and is intended to show that the code required
> to make this work for each PL is small, in support of Kevin's earlier
> argument (a more featureful patch might would presumably also expose
> the contents of the tuplestores directly to Python code, and let
> Python code create arbitrary new tuplestores and expose those to SQL
> queries)

Right, I think we should assume that there will be other ways
people want to use parts of what is done here, including building
tuplestores through other means and referencing them in queries.
As I said, I think we should make that very easy to do through SPI
while providing a low-level mechanism for those building parse
trees directly in C code. If we do this right, we might eventually
want to collapse CTEs or some other existing types of ephemeral
relations into this framework.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-11-21 17:14:21 Re: [sqlsmith] Parallel worker crash on seqscan
Previous Message Tom Lane 2016-11-21 17:00:37 Re: [sqlsmith] Parallel worker crash on seqscan