From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'Zane Duffield' <duffieldzane(at)gmail(dot)com> |
Cc: | "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
Subject: | RE: Lock timeouts and unusual spikes in replication lag with logical parallel transaction streaming |
Date: | 2025-08-20 09:58:59 |
Message-ID: | OSCPR01MB14966ED7F614AFF9EAD38353DF533A@OSCPR01MB14966.jpnprd01.prod.outlook.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Dear Zane,
While analyzing your post and code, I found that parallel apply worker could not
accept the lock timeout. IIUC that's why lock timeout rarely reported and parallel apply
worker exits automatically.
Lock timeout is implemented by sending a SIGINT to the process. Backends set a
signal hander to StatementCancelHandler, which the process will error out while
waiting something. See CHECK_FOR_INTERRUPTS->ProcessInterrupts. The error
message would be: "canceling statement due to lock timeout".
Regarding the parallel apply worker, however, it overwrites the signal hander for
SIGINT; it is used to detect the shutdown request from the leader process. When
parallel apply worker receives, it will exit when it reaches the main loop. Apart
from above case, the process does not exit while waiting the lock, it does after
becoming idle or receives next chunks. The message is same as normal shutdown case.
IIUC, lock timeout should be enabled for all the processes which accesses and
modifies database objects, hence current state should be fixed.
My idea is to use different signal to request shutdown to parallel apply workers.
Since checkpointer and walsender use SIGUSR2 for the similar purpose, this patch
also uses it for parallel apply worker. This issue has existed since PG16.
Note that this does not actually solve the issue what initially reported; this
allows pa worker to report and exit the lock timeout. The replication lag cannot
be resolved only by this.
Per document [1], it is not recommended to set lock_timeout globally.
[1]: https://www.postgresql.org/docs/17/runtime-config-client.html#GUC-LOCK-TIMEOUT
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Attachment | Content-Type | Size |
---|---|---|
v1-PG16-PG17-0001-Make-parallel-apply-worker-accept-lock-timeo.patch | application/octet-stream | 3.9 KB |
v1-PG18-master-0001-Make-parallel-apply-worker-accept-lock-timeout.patch | application/octet-stream | 3.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | shveta malik | 2025-08-20 10:16:30 | Re: Unexpected Standby Shutdown on sync_replication_slots change |
Previous Message | Richard Guo | 2025-08-20 09:37:14 | Re: BUG #19007: Planner fails to choose partial index with spurious 'not null' |