From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Alexander Lakhin <exclusion(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Non-reproducible AIO failure |
Date: | 2025-05-26 00:25:46 |
Message-ID: | 521710.1748219146@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> On Sun, May 25, 2025 at 3:22 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> So far, I've failed to get anything useful out of core files
>> from this failure. The trace goes back no further than
>> (lldb) bt
>> * thread #1
>> * frame #0: 0x000000018de39388 libsystem_kernel.dylib`__pthread_kill + 8
> (And Alexander reported the same off-list.). It's interesting that the
> elog.c backtrace stuff is able to analyse the stack and it looks
> normal AFAICS. Could that be interfering with the stack in the core?!
No, but something is. Just to make sure it wasn't totally broken,
I added a sure-to-fail Assert in a random place (I chose
pg_backend_pid), and I get both a trace in the postmaster log and a
perfectly usable core file:
TRAP: failed Assert("MyProcPid == 0"), File: "pgstatfuncs.c", Line: 692, PID: 59063
0 postgres 0x00000001031f1fa4 ExceptionalCondition + 108
1 postgres 0x00000001031672b4 pg_stat_get_backend_pid + 0
2 postgres 0x0000000102e9e598 ExecInterpExpr + 5524
3 postgres 0x0000000102edb100 ExecResult + 368
4 postgres 0x0000000102ea6418 standard_ExecutorRun + 316
(lldb) bt
* thread #1
* frame #0: 0x00000001836b5388 libsystem_kernel.dylib`__pthread_kill + 8
frame #1: 0x00000001836ee88c libsystem_pthread.dylib`pthread_kill + 296
frame #2: 0x00000001835f7c60 libsystem_c.dylib`abort + 124
frame #3: 0x000000010491dfac postgres`ExceptionalCondition(conditionName=<unavailable>, fileName=<unavailable>, lineNumber=692) at assert.c:66:2 [opt]
frame #4: 0x000000010489329c postgres`pg_backend_pid(fcinfo=<unavailable>) at pgstatfuncs.c:692:2 [opt]
frame #5: 0x00000001045ca598 postgres`ExecInterpExpr(state=0x000000013780d190, econtext=0x000000013780ce38, isnull=<unavailable>) at execExprInterp.c:0 [opt]
frame #6: 0x0000000104607100 postgres`ExecResult [inlined] ExecEvalExprNoReturn(state=<unavailable>, econtext=0x000000013780ce38) at executor.h:417:13 [opt]
frame #7: 0x00000001046070f4 postgres`ExecResult [inlined] ExecEvalExprNoReturnSwitchContext(state=<unavailable>, econtext=0x000000013780ce38) at executor.h:458:2 [opt]
The fact that I can trace through this Assert failure but not the
AIO one strongly suggests some system-level problem in the latter.
There is something rotten in the state of Denmark.
For completeness, this is with Sequoia 15.5 (latest macOS) on
an M4 Pro MacBook.
> but I haven't seen this failure on my little M4 MacBook Air yet
> (Sequoia 15.5, Apple clang-1700.0.13.3). It is infected with
> corporate security-ware that intercepts at least file system stuff and
> slows it down and I can't even convince it to dump core files right
> now.
As far as that goes, if you have SIP turned on (which I'm sure a
corporate laptop would), there are extra steps needed to get a
core to happen. See [1]; that page is old, but the recipe still
works for me.
regards, tom lane
[1] https://nasa.github.io/trick/howto_guides/How-to-dump-core-file-on-MacOS.html
From | Date | Subject | |
---|---|---|---|
Next Message | Sutou Kouhei | 2025-05-26 01:04:05 | Re: Make COPY format extendable: Extract COPY TO format implementations |
Previous Message | Tom Lane | 2025-05-26 00:05:49 | Re: Non-reproducible AIO failure |