Re: Faster methods for getting SPI results (460% improvement)

From: Jim Nasby <jim(dot)nasby(at)openscg(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: Faster methods for getting SPI results (460% improvement)
Date: 2017-04-06 02:44:32
Message-ID: a96b9506-5587-7290-9eed-8830ce7f318e@openscg.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/4/17 7:44 PM, Craig Ringer wrote:
> On 5 April 2017 at 08:23, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>> On 5 April 2017 at 08:00, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> This patch fails to update the documentation at all.
>
> https://www.postgresql.org/docs/devel/static/spi.html

I'll fix that soon.

> missing newline

Fixed.

> +/* Execute a previously prepared plan with a callback Destination */
>
>
> caps "Destination"

Hmm, I capitalized it since DestReceiver is capitalized. I guess it's
best to just drop Destination.

> + // XXX throw error if callback is set

Fixed (opted to just use an Assert).

> +static DestReceiver spi_callbackDR = {
> + donothingReceive, donothingStartup, donothingCleanup, donothingCleanup,
> + DestSPICallback
> +};
> Presumably that's a default destination you're then supposed to modify
> with your own callbacks? There aren't any comments to indicate what's
> going on here.

Correct. Added the following comment:

> /*
> * This is strictly a starting point for creating a callback. It should not
> * actually be used.
> */

> Comments on patch 2:
>
> If this is the "bare minimum" patch, what is pending? Does it impose
> any downsides or limits?

No limits. I'm not aware of any downsides.

It's "bare minimum" because a follow-on step is to allow different
methods of returning results. In particular, my testing indicates that
patch 1 + returning a dict of lists (as opposed to the current list of
dicts) results in a 460% improvement vs ~30% with patch 2.

> +/* Get switch execution contexts */
> +extern PLyExecutionContext
> *PLy_switch_execution_context(PLyExecutionContext *new);
>
> Comment makes no sense to me. This seems something like memory context
> switch, where you supply the new and return the old. But the comment
> doesn't explain it at all.

Comment changed to
/* Switch execution context (similar to MemoryContextSwitchTo */

> +void PLy_CSStartup(DestReceiver *self, int operation, TupleDesc typeinfo);
> +void PLy_CSDestroy(DestReceiver *self);
>
> These are declared in the plpy_spi.c. Why aren't these static?

Derp. Fixed.

> + volatile MemoryContext oldcontext;
> + volatile ResourceOwner oldowner;
> int rv;
> - volatile MemoryContext oldcontext;
> - volatile ResourceOwner oldowner;
>
> Unnecessary code movement.

IMHO it reads better that way. I've left it for now so $COMMITTER can
decide, but if it really bugs you let me know and I'll swap it.

> In PLy_Callback_New, I think your use of a sub-context is sensible. Is
> it necessary to palloc0 though?

Hrm, maybe not... but it seems like cheap insurance considering the
amount of other stuff involved in just starting a new SPI call. And
honestly, I'd rather not mess with it at this point. :) I have added an
XXX comment though.

> +static CallbackState *
> +PLy_Callback_Free(CallbackState *callback)
>
> The code here looks like it could be part of spi.c's callback support,
> rather than plpy_spi specific, since you provide a destroy callback in
> the SPI callback struct.

I added this for use in PG_CATCH() blocks, because it's not clear to me
that the portal gets cleaned up in those cases. It's certainly possible
that it's pointless.

FWIW, I'm pretty sure I copied that pattern from somewhere else...
probably plpgsql or pltcl.

> + /* We need to store this because the TupleDesc the receive
> function gets has no names. */
> + myState->desc = typeinfo;
>
> Is it safe to just store the pointer to the TupleDesc here? What's the lifetime?

Only Portal lifetime.

> + * will clean it up when the time is right. XXX This might result in a leak
> + * if an error happens and the result doesn't get dereferenced.
> + */
> + MemoryContextSwitchTo(TopMemoryContext);
> + result->tupdesc = CreateTupleDescCopy(typeinfo);
>
> especially given this XXX comment...

I've changed the comment to the following. Hopefully this clarifies things:

> /*
> * Save tuple descriptor for later use by result set metadata
> * functions. Save it in TopMemoryContext so that it survives outside of
> * an SPI context. We trust that PLy_result_dealloc() will clean it up
> * when the time is right. The difference between result and everything
> * else is that result needs to survive after the portal is destroyed,
> * because result is what's handed back to the plpython function. While
> * it's tempting to use something other than TopMemoryContext, that won't
> * work: the user could potentially put result into the global dictionary,
> * which means it could survive as long as the session does. This might
> * result in a leak if an error happens and the result doesn't get
> * dereferenced, but if that happens it means the python GC has failed us,
> * at which point we probably have bigger problems.
> *
> * This still isn't perfect though; if something the result tupledesc
> * references has it's OID changed then the tupledesc will be invalid. I'm
> * not sure it's worth worrying about that though.
> */

Updated patches attached, but I still need to update the docs.
--
Jim Nasby, Chief Data Architect, Austin TX
OpenSCG http://OpenSCG.com

Attachment Content-Type Size
0001-Add-SPI_execute_callback.patch text/plain 8.7 KB
0002-Switch-plpython-to-using-SPI_execute_callback.patch text/plain 14.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2017-04-06 03:46:07 Re: Faster methods for getting SPI results (460% improvement)
Previous Message Masahiko Sawada 2017-04-06 02:33:22 Interval for launching the table sync worker