Re: [PROPOSAL] Termination of Background Workers for ALTER/DROP DATABASE

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Iwata, Aya/岩田 彩 <iwata(dot)aya(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Kuroda, Hayato/黒田 隼人 <kuroda(dot)hayato(at)fujitsu(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PROPOSAL] Termination of Background Workers for ALTER/DROP DATABASE
Date: 2026-03-31 17:00:00
Message-ID: fb4028b6-e07f-4d84-a65e-c90bc96e6356@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

31.03.2026 13:54, Michael Paquier wrote:
> On Tue, Mar 31, 2026 at 10:00:00AM +0300, Alexander Lakhin wrote:
>> So the backend is not completely stuck, but CommitTransactionCommand()
>> may take more than 5 seconds under some circumstances (maybe it's worth
>> investigating which exactly).
> One could blame slow hardware, difficult to say, and I'm puzzled by
> these periodic bumps that don't seem to happen elsewhere.

I managed to get the backtrace of such a sluggish backend:
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".
0x0000003fb1f4cc26 in posix_fadvise64 () from /lib/riscv64-linux-gnu/libc.so.6
  Id   Target Id                                   Frame
* 1    Thread 0x3fb2a4c620 (LWP 564194) "postgres" 0x0000003fb1f4cc26 in posix_fadvise64 () from
/lib/riscv64-linux-gnu/libc.so.6
#0  0x0000003fb1f4cc26 in posix_fadvise64 () from /lib/riscv64-linux-gnu/libc.so.6
#1  0x0000002abef79444 in XLogFileClose () at xlog.c:3672
#2  0x0000002abef7cc66 in XLogWrite (WriteRqst=..., tli=tli(at)entry=1, flexible=flexible(at)entry=false) at xlog.c:2356
#3  0x0000002abef7dbfc in XLogFlush (record=33561688) at xlog.c:2892
#4  0x0000002abef77976 in RecordTransactionCommit () at xact.c:1516
#5  CommitTransaction () at xact.c:2379
#6  0x0000002abef78938 in CommitTransactionCommandInternal () at xact.c:3224
#7  0x0000002abef78acc in CommitTransactionCommand () at xact.c:3185
#8  0x0000003fb2a3ed88 in initialize_worker_spi (table=0x2abf8bf358) at worker_spi.c:132
#9  worker_spi_main (main_arg=<optimized out>) at worker_spi.c:181
....
(Three test runs produced the same stack trace.)

I think this can explain slow CommitTransactionCommand() and why it
happens not every time. Regarding other animals, I guess they can
experience the same bumps but not exceeding 5 seconds (50 tries). Thus,
from my understanding, for the failure to happen, we need to have slow
storage and initialize_worker_spi() -> CommitTransactionCommand() reaching
XLogFileClose().

Best regards,
Alexander

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2026-03-31 17:20:56 Re: Introduce XID age based replication slot invalidation
Previous Message Bharath Rupireddy 2026-03-31 16:45:08 Re: Introduce XID age based replication slot invalidation