From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
Cc: | Melanie Plageman <melanieplageman(at)gmail(dot)com>, pgsql-committers(at)lists(dot)postgresql(dot)org |
Subject: | Re: pgsql: Improve runtime and output of tests for replication slots checkp |
Date: | 2025-06-20 23:30:58 |
Message-ID: | 2593441.1750462258@sss.pgh.pa.us |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers |
Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> I think this indicates unfinished intention to wait for checkpoint
> completion. But I think both cases (checkpoint finished and
> unfinished) should work correctly. So, I believe there is a backend
> problem. I'm trying to reproduce this locally. Sorry for the
> confusion.
I've found it not hugely difficult to reproduce: on two different
Mac M4 machines, repeated runs of 046_checkpoint_logical_slot.pl
hung up after 30 or so tries. Backtrace of the stuck query
looks like
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00000001836b01c8 libsystem_kernel.dylib`__semwait_signal + 8
frame #1: 0x000000018358c6f4 libsystem_c.dylib`nanosleep + 220
frame #2: 0x000000010070d358 postgres`pg_usleep(microsec=1000) at pgsleep.c:50:10 [opt]
frame #3: 0x00000001002a0508 postgres`read_local_xlog_page_guts(state=0x0000000126057298, targetPagePtr=687882240, reqLen=277, targetRecPtr=<unavailable>, cur_page="\U00000018\xd1\U00000005", wait_for_wal=true) at xlogutils.c:950:4 [opt]
frame #4: 0x0000000100298f0c postgres`ReadPageInternal(state=0x0000000126057298, pageptr=687882240, reqLen=277) at xlogreader.c:1061:12 [opt]
frame #5: 0x0000000100298300 postgres`XLogReadAhead [inlined] XLogDecodeNextRecord(state=0x0000000126057298, nonblocking=<unavailable>) at xlogreader.c:727:14 [opt]
frame #6: 0x00000001002980b4 postgres`XLogReadAhead(state=0x0000000126057298, nonblocking=false) at xlogreader.c:974:11 [opt]
frame #7: 0x0000000100297fd4 postgres`XLogReadRecord(state=0x0000000126057298, errormsg=0x000000016fc24d08) at xlogreader.c:405:3 [opt]
frame #8: 0x00000001004ee514 postgres`pg_logical_slot_get_changes_guts(fcinfo=<unavailable>, confirm=true, binary=<unavailable>) at logicalfuncs.c:257:13 [opt]
frame #9: 0x00000001004ee1e8 postgres`pg_logical_slot_get_changes(fcinfo=<unavailable>) at logicalfuncs.c:333:9 [opt]
frame #10: 0x00000001003a77ec postgres`ExecMakeTableFunctionResult(setexpr=0x000000012603da40, econtext=<unavailable>, argContext=<unavailable>, expectedDesc=0x00000001260472b0, randomAccess=false) at execSRF.c:234:13 [opt]
frame #11: 0x00000001003ba02c postgres`FunctionNext(node=0x000000012603d6d0) at nodeFunctionscan.c:94:5 [opt]
frame #12: 0x00000001003b4894 postgres`fetch_input_tuple [inlined] ExecProcNode(node=0x000000012603d6d0) at executor.h:313:9 [opt]
frame #13: 0x00000001003b4878 postgres`fetch_input_tuple(aggstate=0x000000012603cf28) at nodeAgg.c:563:10 [opt]
frame #14: 0x00000001003b2eb4 postgres`ExecAgg [inlined] agg_retrieve_direct(aggstate=0x000000012603cf28) at nodeAgg.c:2450:17 [opt]
frame #15: 0x00000001003b2ab8 postgres`ExecAgg(pstate=0x000000012603cf28) at nodeAgg.c:2265:14 [opt]
so apparently it's waiting patiently for WAL that will never appear.
(I do see the 'checkpoint complete' log message before the shutdown
in this particular run.) I still have this debugger session open
if there's any particular data you'd like me to look at.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-06-20 23:42:38 | Re: pgsql: Improve runtime and output of tests for replication slots checkp |
Previous Message | Alexander Korotkov | 2025-06-20 23:28:32 | Re: pgsql: Improve runtime and output of tests for replication slots checkp |