| From: | "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com> |
|---|---|
| To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | <adoros(at)starfishstorage(dot)com>, <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct |
| Date: | 2026-06-18 12:13:57 |
| Message-ID: | DJC60EHI0Y34.1ZTU7FGUMW6DQ@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
On Wed Jun 17, 2026 at 6:56 PM -03, Tom Lane wrote:
> * Your refactoring to have just one PLy_procedure_get call in
> plpython3_call_handler is no good. You missed the comment
> block just above:
>
> /*
> * Push execution context onto stack. It is important that this get
> * popped again, so avoid putting anything that could throw error between
> * here and the PG_TRY.
> */
> exec_ctx = PLy_push_execution_context(!nonatomic);
>
> + proc = PLy_procedure_get(fcinfo, false);
> +
> PG_TRY();
> {
>
> I counsel putting those PLy_procedure_get calls back where they were.
>
You're right, it was a mistake, it was not my original goal to move
outside of PG_TRY(). I've moved the PLy_procedure_get() call back inside
the PG_TRY(). Since the new signature no longer needs a per-call-context
argument, a single call at the top of the PG_TRY block now covers all
three cases, and exec_ctx->curr_proc is set once right after the lookup.
Let me know if I misunderstood your point.
> * I also question the decision to refactor where/how is_trigger is
> computed; that doesn't seem necessary to the purposes of the patch,
> nor is it a clear improvement. I'd just as soon leave that
> mechanism alone as much as we can.
I've restored PLy_procedure_is_trigger() and the validator uses it again
exactly as before, instead of the inlined prorettype checks. The one
unavoidable change that it seems to me is that the trigger type is now
determined inside the compile callback rather than passed in as a
PLyTrigType argument — that's forced by the funccache API, since
cached_function_compile() takes the FunctionCallInfo and the procedure
is created from within the callback instead of PLy_procedure_get(). Or
I'm missing something?
> (Alternative thought: should we rely on the isTrigger/isEventTrigger
> bools that funccache.c sets up for us? I'm not quite sure if getting
> friendly with struct CachedFunctionHashKey is a good idea or not.)
>
I left the callback using CALLED_AS_TRIGGER() / CALLED_AS_EVENT_TRIGGER()
rather than reaching into CachedFunctionHashKey. That keeps us off from
funccache.c internals and matches what plpgsql_compile_callback() does,
which seems to me the safer way to go. What do you think?
> * I find it confusing that you called "PLyProcedureCache *" variables
> "pcache" in some places and "proc" in others. The latter choice seems
> poor because mostly "proc" is a PLyProcedure pointer. Using "proc"
> leads to constructions like "proc->proc", which I don't find
> intelligible.
>
Fixed. Definitely agree, oversight from my side.
> * The new code could do with more comments. I realize that plpython
> is poorly commented in many places, but let's see if we can leave it
> better than we found it.
>
Added header comments to the new PLy_compile_callback and
PLy_delete_callback, expanded the validator comment about why the fake
fcinfo context is built, and expanded the SRF first-call-setup comment
to explain the ValuePerCall model, the per-call-site cache, and the
shutdown-callback handling.
I've also added a regression test, not sure if there is a better way to
exercise this fix but this test crash without this patch applied.
---
On top of your points, I did another self-review pass over v2 and found
a possible pre-existing problem in v1 in the way the patch handled SRF
cleanup, which I've also fixed in v2.
The patch had switched the set-returning-function cleanup from the
original MemoryContextRegisterResetCallback to a
RegisterExprContextCallback (ShutdownPLyFunction), modeled on what
functions.c does for SQL functions. But that copies a property that I
don't that apply for PL/Python: ShutdownExprContext() does not invoke
ExprContext callbacks during an error abort (it only frees the callback
list), and functions.c is fine with that because, as its comment notes,
"transaction abort will take care of releasing executor resources."
PL/Python's resource is a Python refcount, and transaction abort does
not release those. So if a SETOF function was left partially iterated
and the surrounding query then errored, e.g.
CREATE OR REPLACE FUNCTION mysrf() RETURNS SETOF int LANGUAGE plpython3u AS $$
return [1,2,3,4,5]
$$;
SELECT mysrf() / 0;
the iterator's references were leaked for the life of the session. The
original code didn't have this problem because a memory-context reset
callback does run during abort.
Rather than reintroduce a second mechanism, v2 reuses the memory-context
callback that's already there for reference counting.
RemovePLyProcedureCache is registered on the FmgrInfo's fn_mcxt and
therefore runs on abort; it now also releases any Python state left
behind by an interrupted SRF. ShutdownPLyFunction is kept for the cases
it does handle correctly.
---
Thanks for the review! I've tried to address all your points in the
attached v2.
--
Matheus Alcantara
EDB: https://www.enterprisedb.com
| Attachment | Content-Type | Size |
|---|---|---|
| v2-0001-plpython-Use-funccache.c-infrastructure-for-proce.patch | text/plain | 38.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ayush Tiwari | 2026-06-18 13:36:52 | Re: BUG #19524: In `contrib/btree_gist` float4/float8 GiST index operations, handling NaN values with raw C operator |
| Previous Message | Amit Langote | 2026-06-18 10:57:07 | Re: BUG #19484: Segmentation fault triggered by FDW |