Re: Faster methods for getting SPI results (460% improvement)

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Jim Nasby <jim(dot)nasby(at)openscg(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: Faster methods for getting SPI results (460% improvement)
Date: 2017-04-06 07:38:17
Message-ID: CAMsr+YHP2D44FG+oAzuUqZUEiJ=nkywo5XSmk4dptmoZK6Um9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6 April 2017 at 11:50, Jim Nasby <jim(dot)nasby(at)openscg(dot)com> wrote:

> Attached is a complete series of patches that includes the docs patch.

+ <function>SPI_execute_callback</function> is the same as
+ <function>SPI_execute</function>, except that instead of returning results
+ via <structname>SPITupleTable</structname>, the user-supplied
<parameter>callback</parameter>
+ is used. Unlike
+ <function>SPI_execute</function>,
+ <function>SPI_execute_callback</function>
+ will run the callback for every SQL command passed in to
<parameter>command</parameter>.

This doesn't explain why the user should care or prefer this approach.
Maybe after "Unlike":

"<function>SPI_execute_callback</> does not need to accumulate all the
query results into memory before the caller can process them. Instead
of building a <literal>SPI_tuptable</> containing all the results, the
supplied callback is invoked for each row processed by SPI. The row
data is passed to the callback then discarded when the callback
returns. This reduces copying and allows the application to process
results sooner."

The docs do not discuss memory lifetimes. What memory context is each
call invoked in and when is it reset? Something like:

"<literal>rStartup</>, <literal>receiveSlot</> and
<literal>rShutdown</> are all called in a memory context that is reset
for each <function>SPI_execute_callback</>."

Also, what rules apply in terms of what you can/cannot do from within
a callback? Presumably it's unsafe to perform additional SPI calls,
perform transactions, call into the executor, change the current
snapshot, etc, but I would consider that reasonably obvious. Are there
any specific things to avoid?

Under what circumstances is the query execution context [...] freed"
such that the destroy callback is called?

It is not clear to me when reading the document that the user of
SPI_execute_callback is expected to define their own
DestReceiver-compatible struct, and that the private fields that "may
appear beyond this point" are those defined by the application. How
does the app know that its DestReceiver is compatible with the result
of CreateDestReceiver(DestSPICallback)? Should it copy the fields?
etc. Your PLPython example incorporates DestReceiver by-value in its
own state struct; maybe your docs program listing should illustrate
that instead, show how to initialize the destreceiver member, and
separately list the callbacks in destreceiver in an <itemisedlist> ?

It definitely needs to mention the relevant parts of plpython to look
at for an example of usage.

> Right now, the docs don't include a concrete example, because adding one
> would be a pretty large if it demonstrated real usage, which presumably
> means Yet Another Contrib Module strictly for the purpose of demonstrating
> something. Rather than doing that, ISTM it'd be better to point the user at
> what plpythonu is doing.

Seems fine to me.

Notes on the docs aside, I am pretty happy with this and think it's
reasonable to proceed with it for Pg 10.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2017-04-06 07:50:05 Re: Faster methods for getting SPI results (460% improvement)
Previous Message Michael Paquier 2017-04-06 07:32:08 Re: Implementation of SASLprep for SCRAM-SHA-256