Re: delta relations in AFTER triggers

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Kevin Grittner <kgrittn(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: delta relations in AFTER triggers
Date: 2016-11-21 07:05:06
Message-ID: CAEepm=3NvS1MZqzJ5T-rh2xtshqEXwCRCL16Bwhksy1EY03g6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 5, 2016 at 5:09 AM, Kevin Grittner <kgrittn(at)gmail(dot)com> wrote:
> Hearing none, done. Hopefully that makes what remains easier to
> review.

So, as of commit 8c48375e we have the standard syntax REFERENCING
NEW/OLD TABLE AS so that you can provide the name for transition
relations that trigger functions can access, and they are filled with
before and after tuples as appropriate and made available to trigger
code via the AfterTriggersData struct. So now we can write triggers
in C and access those tuplestores. Great!

I spent some time last week reviewing the remaining patches in the
series. They've bit-rotted a bit and one is incomplete, but it's
clear enough what's going on. First, we have transition-noapi-v7.diff
which introduces a new executor node "TuplestoreScan" and a named
tuplestore registry mechanism called "Tsrcache". Then we have a
choice of:

1. transition-tsr-v7.diff, which allows SPI client code to register
tuplestores + associated metadata though a new SPI interface, and
which passes Tsrcache objects to all the places they are needed.

2. transition-via-params-v7.diff, which allows SPI client code to
provide a new parser hook to resolve references to the new relations,
analogous to the way that plpgsql deals with variables referenced in
SQL statements, and then does some juggling to get the tuplestore to
the executor ndoe via a parameter slot.

Both ways have attracted criticism: the first involves touching
basically every core function that might eventually parse, plan or
execute a query to make it accept a Tsrcache and pass that on, and the
second involves a bunch of Rube Goldberg machine-like
callback/variable/parameter code.

I spent some time investigating whether a third way would be viable:
use ParamListInfo's setupParser hook and add an analogous one for the
executor, so that there is no registry equivalent to Tsrcache, but
also no dealing with param slots or (in plpgsql's case) new kinds of
variables. Instead, there would just be two callbacks: one for asking
the tuplestore provider for metadata (statistics, TupleDesc) at
planning time, and another for asking for the tuplestore by name at
execution time. One problem with this approach is that it requires
using the SPI_*_with_paramlist interfaces, not the more commonly used
arg-based versions, and requires using a ParamListInfo when you
otherwise have no reason to because you have no parameters. Also,
dealing with callbacks to register your tuplestore supplier is a
little clunky. More seriously, requiring providers of those callbacks
to write code that directly manipulates ParserState and EState and
calls addRangeTableEntryXXX seems like a modularity violation --
especially for PLs that are less integrated with core Postgres code
than plpgsql. I got this more or less working, but didn't like it
much and didn't think it would pass muster.

After going through that experience, I now agree with Kevin: an
interface where a new SPI interface lets PLs push a named tuplestore
into the SPI connection to make it available to SQL seems like the
simplest and tidiest way. I do have some feedback and suggestions
though:

1. Naming: Using tuplestores in AfterTriggersData make perfect sense
to me but I don't think it follows that the exact thing we should
expose to the executor to allow these to be scanned is a
TuplestoreScan. We have other nodes that essentially scan a
tuplestore like WorkTableScan and Material but they have names that
tell you what they are for. This is for scanning data that has either
been conjured up or passed on by SPI client code and exposed to SQL
queries. How about SpiRelationScan, SpiDataScan, SpiRelVarScan, ....?
Also, Tsrcache is strangely named: it's not exactly a cache, it's
more of a registry.

2. Scope of changes: If we're going to touch functions all over the
source tree to get the Tsrcache where it needs to be for parsing and
execution, then I wonder if we should consider thinking a bit more
about where this is going. What if we had a thing called a
QueryEnvironment, and we passed a pointer to that into to all those
places, and it could contain the named tuplestore registry? Then we
wouldn't have to change all those interfaces again in future if
someone wants to allow more kinds of transient objects to be injected
into the SQL namespace via SPI. For example, future uses could
include transient functions (ie uncatalogued functions that are made
available to a query by the SPI client using a callback to support
private and nested functions), or relations provided via SPI that emit
tuples one-at-a-time, or more sophisticated kinds of transient
relations that support writes, indexes and constraints along the lines
of those in SQL Server's T-SQL.

See attached patches:

* spi-relation-v1.patch, which provides: (1) an SpiRelationScan
executor node, (2) the SPI interface required to feed data to it, (3)
a new QueryEnvironment struct which is used to convey SpiRelation into
the right bits of the planner and executor; this patch is based on
fragments extracted from the -noapi and -tsr patches, and modified as
described above

* spi-relation-plpgsql-v1.patch, to teach plpgsql how to expose the
new and old tables to SQL via the above

* spi-relation-plpython-v1.patch, ditto for plpython; this patch makes
the OLD TABLE and NEW TABLE automatically available to any query you
run via plpy.execute, and is intended to show that the code required
to make this work for each PL is small, in support of Kevin's earlier
argument (a more featureful patch might would presumably also expose
the contents of the tuplestores directly to Python code, and let
Python code create arbitrary new tuplestores and expose those to SQL
queries)

Thoughts?

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
spi-relation-v1.patch application/octet-stream 99.1 KB
spi-relation-plpgsql-v1.patch application/octet-stream 7.7 KB
spi-relation-plpython-v1.patch application/octet-stream 3.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-11-21 07:09:30 Re: regression tests fails
Previous Message Pavel Stehule 2016-11-21 06:45:25 Re: regression tests fails