| From: | surya poondla <suryapoondla4(at)gmail(dot)com> |
|---|---|
| To: | dllggyx(at)outlook(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
| Subject: | Re: BUG #19382: Server crash at __nss_database_lookup |
| Date: | 2026-01-23 01:18:08 |
| Message-ID: | CAOVWO5rVBKsjG4YwO_PJQu2OBGp8qUdF1jineYY6Lm3zc6-KWQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
Hi Yuxiao, Kirill,
Thank you for the test cases.
I can reproduce this issue on PostgreSQL 17.6. I debugged it with lldb and
found the root cause.
When a composite type is altered mid-transaction while a PL/pgSQL record
variable holds data of that type, the server crashes because it interprets
old data using the new type definition without performing type conversion.
The server crashes with this stack trace:
* thread #1, queue = 'main-thread', stop reason = EXC_BAD_ACCESS (code=1,
address=0x117e00000)
frame #0: 0x0000000183c95320 libsystem_platform.dylib`_platform_memmove
+ 96
libsystem_platform.dylib`_platform_memmove:
-> 0x183c95320 <+96>: ldnp q0, q1, [x1]
0x183c95324 <+100>: add x1, x1, #0x20
0x183c95328 <+104>: subs x2, x2, #0x20
0x183c9532c <+108>: b.hi 0x183c95318 ; <+88>
Target 0: (postgres) stopped.
(lldb) bt
* thread #1, queue = 'main-thread', stop reason = EXC_BAD_ACCESS (code=1,
address=0x117e00000)
* frame #0: 0x0000000183c95320 libsystem_platform.dylib`_platform_memmove
+ 96
frame #1: 0x00000001030ef368
postgres`text_to_cstring(t=0x0000000117017b1c) at varlena.c:225:2
frame #2: 0x00000001030f0e58
postgres`textout(fcinfo=0x000000016d5f1b98) at varlena.c:594:2
frame #3: 0x000000010314ed14
postgres`FunctionCall1Coll(flinfo=0x0000000121808cd8, collation=0,
arg1=4680940316) at fmgr.c:1139:11
frame #4: 0x0000000103150880
postgres`OutputFunctionCall(flinfo=0x0000000121808cd8, val=4680940316) at
fmgr.c:1685:25
frame #5: 0x0000000103075c8c
postgres`record_out(fcinfo=0x000000016d5f1d58) at rowtypes.c:435:11
frame #6: 0x000000010314ed14
postgres`FunctionCall1Coll(flinfo=0x0000000121808a28, collation=0,
arg1=4940960546) at fmgr.c:1139:11
frame #7: 0x0000000103150880
postgres`OutputFunctionCall(flinfo=0x0000000121808a28, val=4940960546) at
fmgr.c:1685:25
frame #8: 0x000000010282fa30 postgres`printtup(slot=0x00000001218087a8,
self=0x00000001170102d8) at printtup.c:360:16
frame #9: 0x0000000102b8fdac
postgres`ExecutePlan(queryDesc=0x0000000137010300, operation=CMD_SELECT,
sendTuples=true, numberTuples=0, direction=ForwardScanDirection,
dest=0x00000001170102d8) at execMain.c:1679:9
frame #10: 0x0000000102b8fb98
postgres`standard_ExecutorRun(queryDesc=0x0000000137010300,
direction=ForwardScanDirection, count=0, execute_once=false) at
execMain.c:360:3
frame #11: 0x0000000102b8f988
postgres`ExecutorRun(queryDesc=0x0000000137010300,
direction=ForwardScanDirection, count=0, execute_once=false) at
execMain.c:306:3
frame #12: 0x0000000102ee2bd4
postgres`PortalRunSelect(portal=0x000000012782c500, forward=true, count=0,
dest=0x00000001170102d8) at pquery.c:922:4
frame #13: 0x0000000102ee2568
postgres`PortalRun(portal=0x000000012782c500, count=9223372036854775807,
isTopLevel=true, run_once=true, dest=0x00000001170102d8,
altdest=0x00000001170102d8, qc=0x000000016d5f21b8) at pquery.c:766:18
frame #14: 0x0000000102edce9c
postgres`exec_simple_query(query_string="SELECT bar();") at
postgres.c:1278:10
frame #15: 0x0000000102edbf6c postgres`PostgresMain(dbname="postgres",
username="surya") at postgres.c:4767:7
frame #16: 0x0000000102ed3594 postgres`BackendMain(startup_data="",
startup_data_len=4) at backend_startup.c:106:2
frame #17: 0x0000000102daf8f8
postgres`postmaster_child_launch(child_type=B_BACKEND, startup_data="",
startup_data_len=4, client_sock=0x000000016d5f25b8) at
launch_backend.c:277:3
frame #18: 0x0000000102db7708
postgres`BackendStartup(client_sock=0x000000016d5f25b8) at
postmaster.c:3624:8
frame #19: 0x0000000102db4438 postgres`ServerLoop at postmaster.c:1678:6
frame #20: 0x0000000102db3324 postgres`PostmasterMain(argc=3,
argv=0x000060000321d420) at postmaster.c:1376:11
frame #21: 0x0000000102c369c0 postgres`main(argc=3,
argv=0x000060000321d420) at main.c:199:3
frame #22: 0x00000001838bab98 dyld`start + 6076
(lldb)
The crash happens because textout() is called on integer data, and it
interprets 1073741824 (2^30) as a memory pointer.
I set breakpoints at two critical points to trace the issue:
Breakpoint 1: ExpandedRecordGetDatum (when PL/pgSQL returns the record)
At this point, the record still has complete version information:
(lldb) p erh->er_tupdesc_id
(uint64) 2 // Record was created with version 2
(lldb) p assign_record_type_identifier(erh->er_typeid, erh->er_typmod)
(uint64) 4 // Current type is now version 4
(lldb) p erh->er_tupdesc->attrs[1].atttypid
(Oid) 23 // Field b was INT4 when record was created
(lldb) p ((TypeCacheEntry*)lookup_type_cache(erh->er_typeid,
0x00100))->tupDesc->attrs[1].atttypid
(Oid) 25 // Field b is now TEXT in current definition
Version mismatch detected (2 != 4). The record has integer data but the
type definition changed to TEXT.
Breakpoint 2: record_out (when converting record to text for output)
After ExpandedRecordGetDatum flattens the record to HeapTupleHeader, the
version information is lost:
(lldb) p tupType
(Oid) 32770 //Only type OID preserved
(lldb) p tupTypmod
(int32) -1 //Only typmod preserved
(lldb) p tupdesc->attrs[1].atttypid
(Oid) 25 // Uses current definition: TEXT
When ExpandedRecordHeader is flattened to HeapTupleHeader, HeapTupleHeader
only stores type OID and typmod but not the version identifier.
This returns the current type definition (version 4, field b = TEXT), but
the actual data is still from version 2 (field b = INT, value = 1073741824).
The crash happens at rowtypes.c, when record_out() calls textout() on field
b. Since textout() expects a text pointer but receives an integer, it tries
to dereference 0x40000000 (1073741824 (2^30)), causing a segfault
that leads to the crash.
I believe the fix should be in pl_exec.c before the record is returned. At
the point where we still have access to erh->er_tupdesc_id, and we can
compare erh->er_tupdesc_id with current tupDesc_identifier, if they differ,
the type was altered. For each field with changed type, apply conversion
using exec_cast_value().
If conversion fails or no cast exists, raise a proper error, if not return
the converted record with updated version
This prevents crashes by either converting the data (INT to TEXT which
should work) or raising a clean error message instead of a segfault.
I am working on a patch for this.
Kindly let me know your thoughts.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Langote | 2026-01-23 01:40:41 | Re: BUG #19099: Conditional DELETE from partitioned table with non-updatable partition raises internal error |
| Previous Message | David Rowley | 2026-01-22 01:31:35 | Re: BUG #19385: Normal SELECT generates an ineffecifient query plan compare to the prepared SELECT. |