Re: Endless loop calling PL/Python set returning functions

From: Alexey Grishchenko <agrishchenko(at)pivotal(dot)io>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Endless loop calling PL/Python set returning functions
Date: 2016-03-22 10:15:16
Message-ID: CAH38_tkxLp2fhRjmwf-KWnto8x_r2TaTOsbzCtYNMKKK9jOoSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alexey Grishchenko <agrishchenko(at)pivotal(dot)io> wrote:

> Alexey Grishchenko <agrishchenko(at)pivotal(dot)io> wrote:
>
>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>
>>> Alexey Grishchenko <agrishchenko(at)pivotal(dot)io> writes:
>>> > No, my fix handles this well.
>>> > In fact, with the first function call you allocate global variables
>>> > representing Python function input parameters, call the function and
>>> > receive iterator over the function results. Then in a series of
>>> Postgres
>>> > calls to PL/Python handler you just fetch next value from the
>>> iterator, you
>>> > are not calling the Python function anymore. When the iterator reaches
>>> the
>>> > end, PL/Python call handler deallocates the global variable
>>> representing
>>> > function input parameter.
>>>
>>> > Regardless of the number of parallel invocations of the same function,
>>> each
>>> > of them in my patch would set its own input parameters to the Python
>>> > function, call the function and receive separate iterators. When the
>>> first
>>> > function's result iterator would reach its end, it would deallocate the
>>> > input global variable. But it won't affect other functions as they no
>>> > longer need to invoke any Python code.
>>>
>>> Well, if you think that works, why not undo the global-dictionary changes
>>> at the end of the first call, rather than later? Then there's certainly
>>> no overlap in their lifespan.
>>>
>>> regards, tom lane
>>>
>>
>> Could you elaborate more on this? In general, stack-like solution would
>> work - if before the function call there is a global variable with the name
>> matching input variable name, push its value to the stack, and pop it after
>> the function execution. Would implement it tomorrow and see how it works
>>
>>
>> --
>>
>> Sent from handheld device
>>
>
> I have improved the code using proposed approach. The second version of
> patch is in attachment
>
> It works in a following way - the procedure object PLyProcedure stores
> information about the call stack depth (calldepth field) and the stack
> itself (argstack field). When the call stack depth is zero we don't make
> any additional processing, i.e. there won't be any performance impact for
> existing enduser functions. Stack manipulations are put in action only when
> the calldepth is greater than zero, which can be achieved either when the
> function is called recursively with SPI, or when you are calling the same
> set-returning function in a single query twice or more.
>
> Example of multiple calls to SRF within a single function:
>
> CREATE OR REPLACE FUNCTION func(iter int) RETURNS SETOF int AS $$
> return xrange(iter)
> $$ LANGUAGE plpythonu;
>
> select func(3), func(4);
>
>
> Before the patch query caused endless loop finishing with OOM. Now it
> works as it should
>
> Example of recursion with SPI:
>
> CREATE OR REPLACE FUNCTION test(a int) RETURNS int AS $BODY$
> r = 0
> if a > 1:
> r = plpy.execute("SELECT test(%d) as a" % (a-1))[0]['a']
> return a + r
> $BODY$ LANGUAGE plpythonu;
>
> select test(10);
>
>
> Before the patch query failed with "NameError: global name 'a' is not
> defined". Now it works correctly and returns 55
>
> --
> Best regards,
> Alexey Grishchenko
>

Hi

Any comments on this patch?

Regarding passing parameters to the Python function using globals - it was
in initial design of PL/Python (code
<https://github.com/postgres/postgres/blob/0bef7ba549977154572bdbf5682a32a07839fd82/src/pl/plpython/plpython.c#L783>,
documentation
<http://www.postgresql.org/docs/7.2/static/plpython-using.html>).
Originally you had to work with "args" global list of input parameters and
wasn't able to access the named parameters directly. And you can do so even
with the latest release. Going away from global input parameters would
require switching to PyObject_CallFunctionObjArgs
<https://docs.python.org/2/c-api/object.html#c.PyObject_CallFunctionObjArgs>,
which should be possible by changing the function declaration to include
input parameters plus "args" (for backward compatibility). However,
triggers are a bit different - they depend on modifying the global "TD"
dictionary inside the Python function, and they return only the status
string. For them, there is no option of modifying the code to avoid global
input parameters without breaking the backward compatibility with the old
enduser code

--
Best regards,
Alexey Grishchenko

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2016-03-22 10:41:01 Re: multivariate statistics v14
Previous Message Kyotaro HORIGUCHI 2016-03-22 10:14:33 Re: pgbench - allow backslash-continuations in custom scripts