| From: | SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com> |
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
| Subject: | [PATCH] Fix WAIT FOR LSN standby_write/standby_flush for archive recovery cases |
| Date: | 2026-04-15 06:44:23 |
| Message-ID: | CAHg+QDeHkMcLBKaBu6sxigL2gUsHXye3QQs14zKyD25BnPNAvA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Alexnader, Hackers,
GetCurrentLSNForWaitType() for WAIT_LSN_TYPE_STANDBY_WRITE and
WAIT_LSN_TYPE_STANDBY_FLUSH previously relied on the WAL receiver's
tracked write/flush positions (GetWalRcvWriteRecPtr/GetWalRcvFlushRecPtr).
There are two scenarios where WAIT FOR LSN queries can be stalled though
replay is making progress. Breaking it down to two to give clarity on
setups but
the underlying problem is the same.
There are two scenarios here:
(1). When the standby is disconnected from the primary and switched to WAL
archive mode, it continues to be in that mode until no more WAL is
available to replay
and then switch to streaming mode. Until then WAIT FOR LSN calls get stuck
on the
standby though replay catches up beyond the stale WAL receiver position.
Switching
XLog source from archive to streaming is separately tracked in [1].
(2). In the case of Archive recovery, no WAL receiver process exists, so
these
functions return InvalidXLogRecPtr (0/0). WAIT FOR LSN with standby_flush or
standby_write modes would always time out, even for WAL that has been
fully replayed.
Fix by falling back to the replay LSN (GetXLogReplayRecPtr) when the WAL
receiver position is invalid or behind replay. This is correct because any
WAL that has been replayed has necessarily already been written and flushed
to disk. Attached the repro test case.
Thanks,
Satya
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-Fix-WAIT-FOR-LSN-standby_write-standby_flush-for-arc.patch | application/octet-stream | 2.8 KB |
| 0001-Add-TAP-test-for-WAIT-FOR-LSN-during-archive-recover.patch | application/octet-stream | 7.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | shveta malik | 2026-04-15 06:47:17 | Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication |
| Previous Message | Jakub Wartak | 2026-04-15 06:37:48 | Re: proposal - queryid can be used as filter for auto_explain |