Re: Endless loop calling PL/Python set returning functions

From: Alexey Grishchenko <agrishchenko(at)pivotal(dot)io>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Endless loop calling PL/Python set returning functions
Date: 2016-03-10 16:20:10
Message-ID: CAH38_tkimV2nJu13M8wZGFFDv-4riLB_LB0Zd2hKVCLRTHcXDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I agree that passing function parameters through globals is not the best
solution

It works in a following way - executing custom code (in our case Python
function invocation) in Python is made with PyEval_EvalCode
<https://docs.python.org/2/c-api/veryhigh.html>. As an input to this C
function you specify dictionary of globals that would be available to this
code. The structure PLyProcedure stores "PyObject *globals;", which is the
dictionary of globals for specific function. So SPI works pretty fine, as
each function has a separate dictionary of globals and they don't conflict
with each other

One scenario when the problem occurs, is when you are calling the same
set-returning function in a single query twice. This way they share the
same "globals" which is not a bad thing, but when one function finishes
execution and deallocates input parameter's global, the second will fail
trying to do the same. I included the fix for this problem in my patch

The second scenario when the problem occurs is when you want to call the
same PL/Python function in recursion. For example, this code will not work:

create or replace function test(a int) returns int as $BODY$
r = 0
if a > 1:
r = plpy.execute("SELECT test(%d) as a" % (a-1))[0]['a']
return a + r
$BODY$ language plpythonu;

select test(10);

The function "test" has a single PLyProcedure object allocated to handle
it, thus it has a single "globals" dictionary. When internal function call
finishes, it removes the key "a" from the dictionary, and the outer
function fails with "NameError: global name 'a' is not defined" when it
tries to execute "return a + r"

But the second issue is a separate story and I think it is worth a separate
patch

On Thu, Mar 10, 2016 at 3:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Alexey Grishchenko <agrishchenko(at)pivotal(dot)io> writes:
> > There is a bug in implementation of set-returning functions in PL/Python.
> > When you call the same set-returning function twice in a single query,
> the
> > executor falls to infinite loop which causes OOM.
>
> Ugh.
>
> > Another issue with calling the same set-returning function twice in the
> > same query, is that it would delete the input parameter of the function
> > from the global variables dictionary at the end of execution. With
> calling
> > the function twice, this code attempts to delete the same entry from
> global
> > variables dict twice, thus causing KeyError. This is why the
> > function PLy_function_delete_args is modified as well to check whether
> the
> > key we intend to delete is in the globals dictionary.
>
> That whole business with putting a function's parameters into a global
> dictionary makes me itch. Doesn't it mean problems if one plpython
> function calls another (presumably via SPI)?
>
> regards, tom lane
>

--
Best regards,
Alexey Grishchenko

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-03-10 16:30:51 Re: Add generate_series(date,date) and generate_series(date,date,integer)
Previous Message Robert Haas 2016-03-10 16:09:13 Re: Explain [Analyze] produces parallel scan for select Into table statements.