Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct

From: "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <adoros(at)starfishstorage(dot)com>, <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct
Date: 2026-06-01 22:14:34
Message-ID: DIY2500FL0UW.Z4M7NYWMGGA4@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu May 28, 2026 at 12:12 PM -03, Tom Lane wrote:
> "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com> writes:
>> On Fri May 15, 2026 at 8:11 AM -03, PG Bug reporting form wrote:
>>> The root cause is that srfstate->savedargs is tied to proc->mcxt (which can
>>> be deleted at any per-call boundary) rather than to
>>> funcctx->multi_call_memory_ctx (which lives for the entire SRF lifetime).
>
>> Option A seems to fix the issue (see attached patch) but I've found
>> another issue while playing with this that I think it's related:
>> ...
>> This is because when PLy_procedure_delete() is executed on
>> PLy_procedure_get() it also destroy information related with recursive
>> functions, such as "calldepth", "argstack" and "globals" which cause the
>> assert failure Assert(proc->calldepth > 0) on PLy_global_args_pop() when
>> it's executed on PG_CATCH block on PLy_exec_function() or EXC_BAD_ACCESS
>> when accessing "argstack" or "globals".
>
> Yeah. The bigger picture though is: if we are re-entrantly calling
> either a recursive function or a SRF, we should not destroy any of the
> existing state, nor do we want to replace the function body. The only
> way to have sane behavior is to keep executing the same function body
> until the execution instance (recursion level or continued SRF) is
> done. So these concerns about associated state are only part of the
> problem.
>
> plpgsql ran into this years ago, and its solution has been to maintain
> a reference count on each function parsetree and not destroy an
> obsoleted parsetree till the reference count goes to zero. I've had
> in the back of my head that the other PLs need to do likewise, but it
> hasn't gotten to the front of the to-do list, mainly because the other
> PLs are much less used and so field complaints about this have been
> rare. I had hoped also that the language interpreters underlying the
> other PLs might solve some of this for us, but it's unclear to what
> extent they help. Certainly it's not cool to be clobbering our own
> execution state that's outside the language interpreter.
>
> We might want to go as far as converting the other PLs to use the
> utils/cache/funccache.c infrastructure, but perhaps there is a
> less invasive fix. Certainly, a fix based on funccache.c could not
> be back-patched. (On the other hand, given the rarity of complaints,
> perhaps a HEAD-only fix is acceptable.)
>

I've been exploring the funccache.c approach for plpython. The main
challenge is that plpython uses SFRM_ValuePerCall for SRFs, whereas
plpgsql uses SFRM_Materialize. This means plpgsql can simply increment
use_count at the start of plpgsql_call_handler() and decrement it at the
end, since all results are produced in a single call. For plpython,
ExecMakeTableFunctionResult() calls the handler multiple times, with
use_count returning to zero between calls.

With ValuePerCall, cached_function_compile() may try to re-create an
invalid cache entry because use_count can be 0 while
ExecMakeTableFunctionResult() is in the middle of its loop. In that
case, the SRFState would be lost for the currently running plpython
function.

I'm still not sure how to proceed here but It seems like we would need
some refactoring in plpython to make it work with funccache. Not sure if
changing ValuePerCall to Materialize is a way to go or perhaps there's
another way to fix this.

I've also tried to fix this without funccache, but it seems like we
would end up implementing something similar anyway. That might be a way
to go, but I'm also not sure if it's the best path.

Thoughts?

--
Matheus Alcantara
EDB: https://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2026-06-01 23:26:51 Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct
Previous Message Zsolt Parragi 2026-06-01 22:00:16 Re: Possible G2-item at SERIALIZABLE