Re: Faster methods for getting SPI results

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: Faster methods for getting SPI results
Date: 2016-12-21 14:21:58
Message-ID: dc3daba8-1066-0a7c-641a-0fc4afb1c1ad@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/20/16 10:14 PM, Jim Nasby wrote:
> It would be a lot more efficient if we could just grab datums from the
> executor and make a single copy into plpython (or R), letting the PL
> deal with all the memory management overhead.
>
> I briefly looked at using SPI cursors to do just that, but that looks
> even worse: every fetch is executed in a subtransaction, and every fetch
> creates an entire tuplestore even if it's just going to return a single
> value. (But hey, we never claimed cursors were fast...)
>
> Is there any way to avoid all of this? I'm guessing one issue might be
> that we don't want to call an external interpreter while potentially
> holding page pins, but even then couldn't we just copy a single tuple at
> a time and save a huge amount of palloc overhead?

AFAICT that's exactly how DestRemote works: it grabs a raw slot from the
executor, makes sure it's fully expanded, and sends it on it's way via
pq_send*(). So presumably the same could be done for SPI, by creating a
new CommandDest (ISTM none of the existing ones would do what we want).

I'm not sure what the API for this should look like. One possibility is
to have SPI_execute and friends accept a flag that indicates not to
build a tupletable. I don't think a query needs to be read-only to allow
for no tuplestore, so overloading read_only seems like a bad idea.

Another option is to treat this as a "lightweight cursor" that only
allows forward fetches. One nice thing about that option is it leaves
open the possibility of using a small tuplestore for each "fetch",
without all the overhead that a full blown cursor has. This assumes
there are some use cases where you want to operate on relatively small
sets of tuples at a time, but you don't need to materialize the whole
thing in one shot.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2016-12-21 14:25:01 Re: Why does plpython delay composite type resolution?
Previous Message Andrew Dunstan 2016-12-21 14:04:01 Re: pgstattuple documentation clarification