| From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
|---|---|
| To: | Michael Paquier <michael(at)paquier(dot)xyz> |
| Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Iwata, Aya/岩田 彩 <iwata(dot)aya(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Kuroda, Hayato/黒田 隼人 <kuroda(dot)hayato(at)fujitsu(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: [PROPOSAL] Termination of Background Workers for ALTER/DROP DATABASE |
| Date: | 2026-03-31 17:00:00 |
| Message-ID: | fb4028b6-e07f-4d84-a65e-c90bc96e6356@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
31.03.2026 13:54, Michael Paquier wrote:
> On Tue, Mar 31, 2026 at 10:00:00AM +0300, Alexander Lakhin wrote:
>> So the backend is not completely stuck, but CommitTransactionCommand()
>> may take more than 5 seconds under some circumstances (maybe it's worth
>> investigating which exactly).
> One could blame slow hardware, difficult to say, and I'm puzzled by
> these periodic bumps that don't seem to happen elsewhere.
I managed to get the backtrace of such a sluggish backend:
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".
0x0000003fb1f4cc26 in posix_fadvise64 () from /lib/riscv64-linux-gnu/libc.so.6
Id Target Id Frame
* 1 Thread 0x3fb2a4c620 (LWP 564194) "postgres" 0x0000003fb1f4cc26 in posix_fadvise64 () from
/lib/riscv64-linux-gnu/libc.so.6
#0 0x0000003fb1f4cc26 in posix_fadvise64 () from /lib/riscv64-linux-gnu/libc.so.6
#1 0x0000002abef79444 in XLogFileClose () at xlog.c:3672
#2 0x0000002abef7cc66 in XLogWrite (WriteRqst=..., tli=tli(at)entry=1, flexible=flexible(at)entry=false) at xlog.c:2356
#3 0x0000002abef7dbfc in XLogFlush (record=33561688) at xlog.c:2892
#4 0x0000002abef77976 in RecordTransactionCommit () at xact.c:1516
#5 CommitTransaction () at xact.c:2379
#6 0x0000002abef78938 in CommitTransactionCommandInternal () at xact.c:3224
#7 0x0000002abef78acc in CommitTransactionCommand () at xact.c:3185
#8 0x0000003fb2a3ed88 in initialize_worker_spi (table=0x2abf8bf358) at worker_spi.c:132
#9 worker_spi_main (main_arg=<optimized out>) at worker_spi.c:181
....
(Three test runs produced the same stack trace.)
I think this can explain slow CommitTransactionCommand() and why it
happens not every time. Regarding other animals, I guess they can
experience the same bumps but not exceeding 5 seconds (50 tries). Thus,
from my understanding, for the failure to happen, we need to have slow
storage and initialize_worker_spi() -> CommitTransactionCommand() reaching
XLogFileClose().
Best regards,
Alexander
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bharath Rupireddy | 2026-03-31 17:20:56 | Re: Introduce XID age based replication slot invalidation |
| Previous Message | Bharath Rupireddy | 2026-03-31 16:45:08 | Re: Introduce XID age based replication slot invalidation |