Segfault due to NULL ParamExecData value

From: Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Segfault due to NULL ParamExecData value
Date: 2025-12-04 14:25:55
Message-ID: CAO6_XqriD8thgACyTxDsUAaCVMJXKn3=c=cW56giN4_-syh_hQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

We had multiple segfaults happening on PG14.17. All coredumps showed the
following backtrace:

#0: postgres`toast_raw_datum_size(value=0) at detoast.c:550:6
#1: postgres`textne(fcinfo=0x0000c823e4759338) at varlena.c:1848:10
#2: postgres`ExecInterpExpr(state=0x0000c823e4758a48,
econtext=0x0000c823e4757b88, isnull=<unavailable>) at execExprInterp.c:749:8
#3: postgres`ExecScan at executor.h:342:13
#4: postgres`ExecScan at executor.h:411:8
#5: postgres`ExecScan(node=0x0000c823e4757978,
accessMtd=(postgres`FunctionNext at nodeFunctionscan.c:61:1),
recheckMtd=(postgres`FunctionRecheck)) at execScan.c:226:23
#6: postgres`ExecSubPlan [inlined] ExecProcNode(node=0x0000c823e4757978)
at executor.h:260:9
#7: postgres`ExecSubPlan at nodeSubplan.c:302:14
#8: postgres`ExecSubPlan(node=0x0000c823e47814e8,
econtext=0x0000c823e475a658, isNull=0x0000c823e47814c0) at
nodeSubplan.c:89:12
#9: postgres`ExecInterpExpr at execExprInterp.c:3954:18
#10: postgres`ExecInterpExpr(state=0x0000c823e47813c0,
econtext=0x0000c823e475a658, isnull=<unavailable>) at
execExprInterp.c:1576:4
#11: postgres`ExecNestLoop [inlined]
ExecEvalExprSwitchContext(isNull=0x0000ffffebb9d637,
econtext=0x0000c823e475a658, state=<unavailable>) at executor.h:342:13
#12: postgres`ExecNestLoop [inlined] ExecProject(projInfo=<unavailable>) at
executor.h:376:9
#13: postgres`ExecNestLoop(pstate=<unavailable>) at nodeNestloop.c:241:12
#14: postgres`EvalPlanQual at executor.h:260:9
#15: postgres`ExecUpdate(mtstate=0x0000c823e4651a98,
resultRelInfo=0x0000c823e4651ca8, tupleid=0x0000ffffebb9d858,
oldtuple=0x0000000000000000, slot=<unavailable>,
planSlot=0x0000c823e4661800, epqstate=0x0000c823e4651b80,
estate=0x0000c823e46ace18, canSetTag=<unavailable>) at
nodeModifyTable.c:2007:18
#16: postgres`ExecModifyTable(pstate=0x0000c823e4651a98) at
nodeModifyTable.c:2760:12
#17: postgres`standard_ExecutorRun [inlined]
ExecProcNode(node=0x0000c823e4651a98) at executor.h:260:9
#18: postgres`standard_ExecutorRun at execMain.c:1555:10
#19: postgres`standard_ExecutorRun(queryDesc=0x0000c823e45d51a0,
direction=<unavailable>, count=0, execute_once=<unavailable>) at
execMain.c:360:3

textne's arg2 is null, leading to the segfault in toast_raw_datum_size. The
segfaults all happened with the following query.

WITH RECURSIVE
params AS (SELECT $1::text AS schema, $2::text AS name, $3::text AS
version),
seed AS (SELECT p.schema || E'\t^\t' || p.name AS node FROM params p),
reach AS (SELECT v."schema" || E'\t^\t' || v."name" AS node
FROM definitions v WHERE (SELECT node FROM seed) = ANY(v.used_tables)
UNION
SELECT v."schema" || E'\t^\t' || v."name" AS node
FROM definitions v, reach r WHERE r.node = ANY(v.used_tables)),
to_update AS (SELECT DISTINCT split_part(r.node, E'\t^\t', 1) AS "schema",
split_part(r.node, E'\t^\t', 2) AS "name" FROM reach r),
kv AS (SELECT (p.schema || '.' || p.name) AS key_str,
(p.schema || '.' || p.name || ':' || p.version) AS new_entry
FROM params p)
UPDATE definitions v SET dependencies = array_cat(
COALESCE(ARRAY(SELECT e
FROM unnest(COALESCE(v.dependencies, ARRAY[]::text[])) AS e
WHERE split_part(e, ':', 1) <> (SELECT key_str FROM kv)
), ARRAY[]::text[]
), ARRAY[(SELECT new_entry FROM kv)]
)
FROM to_update u
WHERE v."schema" = u."schema" AND v."name" = u."name";

Which has the following plan:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Update on definitions v (cost=63318199175.74..63318200084.69 rows=0
width=0)
CTE params
-> Result (cost=0.00..0.01 rows=1 width=96)
CTE reach
-> Recursive Union (cost=0.03..61108645382.88 rows=73651793079
width=32)
-> Seq Scan on definitions v_1 (cost=0.03..370199.45
rows=27139 width=32)
Filter: ($2 = ANY (used_tables))
InitPlan 2 (returns $2)
-> CTE Scan on params p (cost=0.00..0.03 rows=1
width=32)
-> Nested Loop (cost=0.00..5963523932.18 rows=7365176594
width=32)
Join Filter: (r_1.node = ANY (v_2.used_tables))
-> Seq Scan on definitions v_2 (cost=0.00..363124.99
rows=555099 width=315)
-> WorkTable Scan on reach r_1 (cost=0.00..5427.80
rows=271390 width=32)
CTE kv
-> CTE Scan on params p_1 (cost=0.00..0.04 rows=1 width=64)
InitPlan 7 (returns $7)
-> CTE Scan on kv kv_1 (cost=0.00..0.02 rows=1 width=32)
-> Nested Loop (cost=2209553792.80..2209554701.75 rows=100 width=126)
-> Subquery Scan on u (cost=2209553792.37..2209553797.37
rows=200 width=152)
-> HashAggregate (cost=2209553792.37..2209553795.37
rows=200 width=64)
Group Key: split_part(r.node, ' ^ '::text,
1), split_part(r.node, ' ^ '::text, 2)
-> CTE Scan on reach r (cost=0.00..1841294826.97
rows=73651793079 width=64)
-> Index Scan using definitions_schema_name_idx on definitions v
(cost=0.42..4.40 rows=3 width=68)
Index Cond: ((schema = u.schema) AND (name = u.name))
SubPlan 6
-> Function Scan on unnest e (cost=0.02..0.17 rows=9 width=32)
Filter: (split_part(e, ':'::text, 1) <> $5)
InitPlan 5 (returns $5)
-> CTE Scan on kv (cost=0.00..0.02 rows=1 width=32)

Unfortunately, I wasn't able to reproduce the segfault, so the only
available information I have are the coredumps.

The failure happens when textne of 'WHERE split_part(e, ':', 1) <> (SELECT
key_str FROM kv)' is evaluated. Looking at ExprState, there are 7 steps
with the following opcodes:
0: SCAN_FETCHSOME
1: SCAN_VAR
2: FUNC_EXPR_STRICT
3: PARAM_EXEC
4: FUNC_EXPR_STRICT
5: EEOP_QUAL
6: EEOP_DONE

Step 3 runs the subplan InitPlan 5 to fill the arg2 for textne (step 4). If
I look at step 3's param:
p state->steps[3].d.param
((unnamed struct)) $219 = (paramid = 5, paramtype = 25)

Then, looking at the matching ParamExecData:
p econtext->ecxt_param_exec_vals[5]
(ParamExecData) $220 = (execPlan = 0x0000000000000000, value = 0, isnull =
false)

When looking at the matching WAL records, we also see at least two updates
before the segfault is triggered:
rmgr: Heap len (rec/tot): 59/ 2139, tx: 2549003939, lsn:
B4D/21956518, prev B4D/219564E8, desc: LOCK off 1: xid 2549003939: flags
0x00 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 1663/16386/16899 blk
160730 FPW
rmgr: Heap len (rec/tot): 2055/ 2055, tx: 2549003939, lsn:
B4D/21956D78, prev B4D/21956518, desc: UPDATE off 1 xmax 2549003939 flags
0x11 KEYS_UPDATED ; new off 3 xmax 0, blkref #0: rel 1663/16386/16899 blk
160730
rmgr: Btree len (rec/tot): 55/ 1971, tx: 2549003939, lsn:
B4D/21957698, prev B4D/21957668, desc: INSERT_LEAF off 122, blkref #0: rel
1663/16386/16905 blk 3261 FPW
rmgr: Btree len (rec/tot): 104/ 104, tx: 2549003939, lsn:
B4D/21957E50, prev B4D/21957698, desc: INSERT_LEAF off 18, blkref #0: rel
1663/16386/16907 blk 1517
rmgr: Btree len (rec/tot): 104/ 104, tx: 2549003939, lsn:
B4D/21957EB8, prev B4D/21957E50, desc: INSERT_LEAF off 90, blkref #0: rel
1663/16386/53784856 blk 1020
rmgr: Btree len (rec/tot): 55/ 1156, tx: 2549003939, lsn:
B4D/21957F20, prev B4D/21957EB8, desc: INSERT_LEAF off 11, blkref #0: rel
1663/16386/57258051 blk 7015 FPW
rmgr: Btree len (rec/tot): 55/ 209, tx: 2549003939, lsn:
B4D/219583C0, prev B4D/21957F20, desc: INSERT_LEAF off 4, blkref #0: rel
1663/16386/57459940 blk 1921 FPW
rmgr: Gin len (rec/tot): 566/ 566, tx: 2549003939, lsn:
B4D/21958498, prev B4D/219583C0, desc: UPDATE_META_PAGE , blkref #0: rel
1663/16386/57459942 blk 0, blkref #1: rel 1663/16386/57459942 blk 808
rmgr: Heap len (rec/tot): 54/ 54, tx: 2549003939, lsn:
B4D/25A4C7F0, prev B4D/25A4C7B8, desc: LOCK off 9: xid 2549003939: flags
0x00 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/16386/16899 blk 40
rmgr: Heap len (rec/tot): 1827/ 1827, tx: 2549003939, lsn:
B4D/25A4CAC8, prev B4D/25A4CA88, desc: HOT_UPDATE off 9 xmax 2549003939
flags 0x10 ; new off 10 xmax 2549003939, blkref #0: rel 1663/16386/16899
blk 40
rmgr: Heap2 len (rec/tot): 56/ 56, tx: 2549003939, lsn:
B4D/25A4D1F0, prev B4D/25A4CAC8, desc: PRUNE latestRemovedXid 0 nredirected
0 ndead 0, blkref #0: rel 1663/16386/16899 blk 100
rmgr: Heap len (rec/tot): 54/ 54, tx: 2549003939, lsn:
B4D/25A4D228, prev B4D/25A4D1F0, desc: LOCK off 1: xid 2549003939: flags
0x00 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/16386/16899 blk 19749

On the logs side, we see row lock contentions happening before the segfault:
2025-11-04T17:02:56.507Z,process 289871 still waiting for ShareLock on
transaction 2549003939 after 1000.053 ms
2025-11-04T17:02:56.507Z,Process holding the lock: 292365. Wait queue:
289871.
2025-11-04T17:02:58.938Z,process 292365 still waiting for ShareLock on
transaction 2549003931 after 1000.052 ms
2025-11-04T17:02:58.938Z,Process holding the lock: 292716. Wait queue:
285801, 292365.
2025-11-04T17:02:58.938Z,while updating tuple (40,8) in relation
"definitions"
2025-11-04T17:03:00.041Z,process 292365 acquired ShareLock on transaction
2549003931 after 2102.985 ms
2025-11-04T17:03:00.041Z,while updating tuple (40,8) in relation
"definitions"
2025-11-04T17:03:00.041Z,process 283964 acquired ExclusiveLock on tuple
(40,8) of relation 16899 of database 16386 after 1997.621 ms
2025-11-04T17:03:00.201Z,server process (PID 292365) was terminated by
signal 11: Segmentation fault

So it looks like the ParamExec for the InitPlan 5 was correctly executed
(since execPlan is null) and the value was probably used during the first
two updates. But for the third update, the ParamExecData's value was null
leading to the segfault.
All coredumps (or rather WAL records) show a similar pattern of 2 updates
before segfaults.
I haven't been able to reproduce the segfault so I wasn't able to pinpoint
what could have set ParamExecData's value to null.

Regards,
Anthonin Bonnefoy

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2025-12-04 15:35:36 Re: Segfault due to NULL ParamExecData value
Previous Message VASUKI M 2025-12-04 11:08:40 Re: BUG #19095: Test if function exit() is used fail when linked static