RE: Lock timeouts and unusual spikes in replication lag with logical parallel transaction streaming

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Zane Duffield' <duffieldzane(at)gmail(dot)com>
Cc: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Subject: RE: Lock timeouts and unusual spikes in replication lag with logical parallel transaction streaming
Date: 2025-08-20 09:58:59
Message-ID: OSCPR01MB14966ED7F614AFF9EAD38353DF533A@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Dear Zane,

While analyzing your post and code, I found that parallel apply worker could not
accept the lock timeout. IIUC that's why lock timeout rarely reported and parallel apply
worker exits automatically.

Lock timeout is implemented by sending a SIGINT to the process. Backends set a
signal hander to StatementCancelHandler, which the process will error out while
waiting something. See CHECK_FOR_INTERRUPTS->ProcessInterrupts. The error
message would be: "canceling statement due to lock timeout".

Regarding the parallel apply worker, however, it overwrites the signal hander for
SIGINT; it is used to detect the shutdown request from the leader process. When
parallel apply worker receives, it will exit when it reaches the main loop. Apart
from above case, the process does not exit while waiting the lock, it does after
becoming idle or receives next chunks. The message is same as normal shutdown case.

IIUC, lock timeout should be enabled for all the processes which accesses and
modifies database objects, hence current state should be fixed.

My idea is to use different signal to request shutdown to parallel apply workers.
Since checkpointer and walsender use SIGUSR2 for the similar purpose, this patch
also uses it for parallel apply worker. This issue has existed since PG16.

Note that this does not actually solve the issue what initially reported; this
allows pa worker to report and exit the lock timeout. The replication lag cannot
be resolved only by this.
Per document [1], it is not recommended to set lock_timeout globally.

[1]: https://www.postgresql.org/docs/17/runtime-config-client.html#GUC-LOCK-TIMEOUT

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
v1-PG16-PG17-0001-Make-parallel-apply-worker-accept-lock-timeo.patch application/octet-stream 3.9 KB
v1-PG18-master-0001-Make-parallel-apply-worker-accept-lock-timeout.patch application/octet-stream 3.9 KB

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message shveta malik 2025-08-20 10:16:30 Re: Unexpected Standby Shutdown on sync_replication_slots change
Previous Message Richard Guo 2025-08-20 09:37:14 Re: BUG #19007: Planner fails to choose partial index with spurious 'not null'