BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: adoros(at)starfishstorage(dot)com
Subject: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct
Date: 2026-05-15 11:11:37
Message-ID: 19480-f1f9fdce30462fc4@postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 19480
Logged by: Andrzej Doros
Email address: adoros(at)starfishstorage(dot)com
PostgreSQL version: 17.9
Operating system: Ubuntu 22.04.5 LTS (x86_64), kernel 5.15, glibc 2.
Description:

PostgreSQL version: 17.9 (production crash), confirmed identical on 17.10
OS: Ubuntu 22.04.5 LTS, x86_64, kernel 5.15, glibc 2.35
Package: postgresql-plpython3-17 from pgdg apt repository

DESCRIPTION
-----------

A PL/Python set-returning function (SRF) crashes the backend with SIGSEGV
when
another session executes CREATE OR REPLACE FUNCTION (or ALTER FUNCTION) on
the
same function while the SRF is mid-iteration.

This is a use-after-free. srfstate->savedargs is allocated inside proc->mcxt
by
PLy_function_save_args() (plpy_exec.c:503). On each per-call SRF invocation,
plpython3_call_handler calls PLy_procedure_get(), which may call
PLy_procedure_delete(old_proc) -> MemoryContextDelete(old_proc->mcxt) if the
function's pg_proc row has changed (different xmin or ctid). After that,
srfstate->savedargs is a dangling pointer — it is not cleared. The next
PLy_function_restore_args() reads freed memory:

if (srfstate->savedargs) /* non-NULL dangling pointer
*/
PLy_function_restore_args(proc, srfstate->savedargs); /* reads
freed mem */

Inside PLy_function_restore_args (plpy_exec.c:551):

for (i = 0; i < savedargs->nargs; i++) /* nargs from freed memory */
{
if (proc->argnames[i] && ...)
PyDict_SetItemString(..., proc->argnames[i], ...);

When savedargs->nargs is garbage (e.g. 2056017128 in two production core
dumps),
proc->argnames[i] for large i reads an invalid pointer, which is passed to
PyDict_SetItemString -> PyUnicode_FromString -> strlen -> SIGSEGV.

CRASH STACK (two identical core dumps from production, PG 17.9, Ubuntu
22.04)
------------------------------------------------------------------------------

#0 __strlen_evex()
#1 PyUnicode_FromString(u=0x69ffff0000)
#2 PyDict_SetItemString(...)
#3 PLy_function_restore_args(proc=..., savedargs=...)
#4 PLy_exec_function(...)
#5 plpython3_call_handler(...)
#6 fmgr_security_definer(...)
#7 ExecMakeTableFunctionResult(...)

State from the newer core dump:

proc->proname = "tags_report_plpython"
proc->nargs = 1
proc->argnames[0]= "flavour"
savedargs->nargs = 2056017128 <- should be 1; contains garbage
savedargs->namedargs[0] = 'tags' <- still valid (not yet overwritten)
i = 4 <- loop has iterated far past argnames[]

TRIGGER CONDITION
-----------------

The pg_proc invalidation reaches Session A's backend when
AcceptInvalidationMessages() is called. This happens when Session A's Python
code calls plpy.execute() with a statement that acquires a NEW relation lock
(e.g. CREATE TEMP TABLE, any table not previously locked in this statement).
Simply calling plpy.execute("SELECT 1") is not sufficient because the lock
on
pg_proc is already held and subsequent requests are served from the
per-process
lock table without invoking AcceptInvalidationMessages.

In production the trigger is autovacuum on pg_proc (which moves the tuple's
ctid) or any concurrent DDL from another session. Long-running SRFs (hours)
are much more likely to hit this window.

STEPS TO REPRODUCE
------------------

Requires two concurrent sessions and PostgreSQL with plpython3u.

Session A — start and leave running:

CREATE EXTENSION IF NOT EXISTS plpython3u;

CREATE OR REPLACE FUNCTION repro_srf(flavour VARCHAR)
RETURNS TABLE (i BIGINT) AS $$
import time
for i in range(100):
-- CREATE TEMP TABLE acquires a new relation lock each iteration,
-- which causes AcceptInvalidationMessages to be called.
plpy.execute(f"CREATE TEMP TABLE _rt_{i} (x int)")
plpy.execute(f"DROP TABLE _rt_{i}")
time.sleep(0.3)
yield i
$$ LANGUAGE plpython3u VOLATILE;

SELECT count(*) FROM repro_srf('test');

Session B — while Session A is running (after ~2 seconds):

CREATE OR REPLACE FUNCTION repro_srf(flavour VARCHAR)
RETURNS TABLE (i BIGINT) AS $$
import time
for i in range(100):
plpy.execute(f"CREATE TEMP TABLE _rt_{i} (x int)")
plpy.execute(f"DROP TABLE _rt_{i}")
time.sleep(0.3)
yield i
$$ LANGUAGE plpython3u VOLATILE;

NOTE: In a minimal test without memory pressure, the freed savedargs memory
is often not overwritten quickly enough to produce a crash —
savedargs->nargs
accidentally retains its correct value of 1 and restore_args succeeds. Under
production load (long-running SRF, many Python allocations), the freed
region
is overwritten and the crash occurs.

The crash can be triggered deterministically with gdb by setting
savedargs->nargs to a large value immediately after PLy_procedure_delete
fires
(see gdb script below). This produces the identical crash stack seen in
production.

GDB CONFIRMATION (PostgreSQL 17.10)
-------------------------------------

The following gdb session was used to confirm the exact sequence:

(gdb) b PLy_procedure_delete
(gdb) commands 1
> printf "DELETE proname=%s mcxt=%p\n", proc->proname, proc->mcxt
> set $corrupt_next = 1
> c
> end
(gdb) b PLy_function_restore_args
(gdb) commands 2
> if $corrupt_next
> set {int}((long)savedargs + 24) = 2056017128
> set $corrupt_next = 0
> end
> c
> end

Output:

DELETE proname=repro_srf mcxt=0x5686641e1b20
[PLy_function_restore_args fires with savedargs=0x5686641e28e8]
[nargs set to 2056017128]
Program received signal SIGSEGV, Segmentation fault.
__strlen_avx2 ()

PostgreSQL log:
server process (PID 366) was terminated by signal 11: Segmentation fault
all server processes terminated; reinitializing

AFFECTED CODE
-------------

src/pl/plpython/plpy_exec.c, lines 503-506:
PLy_function_save_args allocates savedargs in proc->mcxt

src/pl/plpython/plpy_exec.c, lines 117-119:
PLy_function_restore_args is called with potentially dangling savedargs
(no check whether proc was rebuilt since savedargs was created)

src/pl/plpython/plpy_procedure.c, line 405 (PLy_procedure_delete):
MemoryContextDelete(proc->mcxt) frees savedargs without nulling
srfstate->savedargs

PROPOSED FIX
------------

The root cause is that srfstate->savedargs is tied to proc->mcxt (which can
be deleted at any per-call boundary) rather than to
funcctx->multi_call_memory_ctx (which lives for the entire SRF lifetime).

Option A — allocate savedargs in funcctx->multi_call_memory_ctx:
Change PLy_function_save_args to accept a MemoryContext parameter and pass
funcctx->multi_call_memory_ctx from PLy_exec_function. The saved PyObject*
references are valid regardless of which MemoryContext holds the struct.

Option B — detect proc rebuild and discard stale savedargs:
After PLy_procedure_get returns a new proc, check whether it differs from
the
proc that created srfstate->savedargs. If so, discard savedargs
(PLy_function_drop_args or simply set to NULL) and skip the restore.

Browse pgsql-bugs by date

  From Date Subject
Next Message Jeff Davis 2026-05-15 16:59:39 Re: BUG #19413: ASAN: stack-buffer-overflow in foldcase_options() with invalid ICU language tag
Previous Message PG Bug reporting form 2026-05-15 06:41:22 BUG #19479: 04 is happening because your metadata is advertising a package version that is not actually present