Re: Endless loop calling PL/Python set returning functions

From: Alexey Grishchenko <agrishchenko(at)pivotal(dot)io>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Endless loop calling PL/Python set returning functions
Date: 2016-03-10 16:56:45
Message-ID: CAH38_tkjuyo3yAZrqnZ=-kCb7TOoMJG56rB6TXiK-SQXTf+1mA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

No, my fix handles this well.

In fact, with the first function call you allocate global variables
representing Python function input parameters, call the function and
receive iterator over the function results. Then in a series of Postgres
calls to PL/Python handler you just fetch next value from the iterator, you
are not calling the Python function anymore. When the iterator reaches the
end, PL/Python call handler deallocates the global variable representing
function input parameter.

Regardless of the number of parallel invocations of the same function, each
of them in my patch would set its own input parameters to the Python
function, call the function and receive separate iterators. When the first
function's result iterator would reach its end, it would deallocate the
input global variable. But it won't affect other functions as they no
longer need to invoke any Python code. Even if they need - they would
reallocate global variable (it would be set before the Python function
invocation). The issue here was in the fact that they tried to deallocate
the global input variable multiple times independently, which caused error
that I fixed.

Regarding the patch for the second case with recursion - not caching the
"globals" between function calls would have a performance impact, as you
would have to construct "globals" object before each function call. And you
need globals as it contains references to "plpy" module and global
variables and global dictionary ("GD"). I will think on this, maybe there
might be a better design for this scenario. But I still think the second
scenario requires a separate patch

On Thu, Mar 10, 2016 at 4:33 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Alexey Grishchenko <agrishchenko(at)pivotal(dot)io> writes:
> > One scenario when the problem occurs, is when you are calling the same
> > set-returning function in a single query twice. This way they share the
> > same "globals" which is not a bad thing, but when one function finishes
> > execution and deallocates input parameter's global, the second will fail
> > trying to do the same. I included the fix for this problem in my patch
>
> > The second scenario when the problem occurs is when you want to call the
> > same PL/Python function in recursion. For example, this code will not
> work:
>
> Right, the recursion case is what's not being covered by this patch.
> I would rather have a single patch that deals with both of those cases,
> perhaps by *not* sharing the same dictionary across calls. I think
> what you've done here is not so much a fix as a band-aid. In fact,
> it doesn't even really fix the problem for the two-calls-per-query
> case does it? It'll work if the first execution of the SRF is run to
> completion before starting the second one, but not if the two executions
> are interleaved. I believe you can test that type of scenario with
> something like
>
> select set_returning_function_1(...), set_returning_function_2(...);
>
> regards, tom lane
>

--
Best regards,
Alexey Grishchenko

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2016-03-10 16:57:11 Re: Add generate_series(date,date) and generate_series(date,date,integer)
Previous Message Tom Lane 2016-03-10 16:48:41 Re: Is there a way around function search_path killing SQL function inlining?