Re: Faster methods for getting SPI results (460% improvement)

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: Faster methods for getting SPI results (460% improvement)
Date: 2017-02-23 21:56:41
Message-ID: c537b2e7-38cd-0507-2255-69541c9da7b9@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/23/17 9:23 PM, Jim Nasby wrote:
> I think the last step here is to figure out how to support switching
> between the current behavior and the "columnar" behavior of a dict of lists.

I've thought more about this... instead of trying to switch from the
current situation of 1 choice of how results are return to 2 choices, I
think it'd be better to just expose the API that the new Destination
type provides to SPI. Specifically, execute a python function during
Portal startup, and a different function for receiving tuples. There'd
be an optional 3rd function for Portal shutdown.

The startup function would be handed details of the resultset it was
about to receive, as a list that contained python tuples with the
results of SPI_fname, _gettype, _gettypeid. This function would return a
callback version number and a python object that would be kept in the
DestReceiver.

The receiver function would get the object created by the startup
function, as well as a python tuple of the TupleTableSlot that had gone
through type conversion. It would need to add the value to the object
from the startup function. It would return true or false, just like a
Portal receiver function does.

The shutdown function would receive the object that's been passed
around. It would be able to do any post-processing. Whatever it returned
is what would be handed back to python as the results of the query.

The version number returned by the startup function allows for future
improvements to this facility. One idea there is allowing the startup
function to control how Datums get mapped into python objects.

In order to support all of this without breaking backwards compatibility
or forking a new API, plpy.execute would accept a kwdict, to avoid
conflicting with the arbitrary number of arguments that can currently be
accepted. We'd look in the kwdict for a key called "portal_functions"
pointing at a 2 or 3 element tuple of the startup, receive and shutdown
functions. plpy would pre-define a tuple that provides the current
behavior, and that's what would be used by default. In the future, we
might add a way to control the default.

Comments?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2017-02-23 22:13:19 Poor memory context performance in large hash joins
Previous Message Nico Williams 2017-02-23 21:55:03 Re: Idea on how to simplify comparing two sets