| From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
|---|---|
| To: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
| Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Kirill Reshke <reshkekirill(at)gmail(dot)com> |
| Subject: | Re: [PATCH] Add archive_mode=follow_primary to prevent unarchived WAL on standby promotion |
| Date: | 2025-10-27 05:26:21 |
| Message-ID: | CAHGQGwHNQcwsyLP4UqnUBoRPo4+vT=wvfe6reLX4TxwES-48qQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Oct 24, 2025 at 1:25 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
> Hi hackers,
>
> I'd like to propose a new archive_mode setting to address a gap in WAL
> archiving for high availability streaming replication configurations.
>
> ## Problem
>
> In HA setups using streaming replication, standbys can be
> promoted when primary has failed. Some WAL segments might be not yet
> archived. This creates gaps in the WAL archive, breaking point-in-time
> recovery:
>
> 1. Primary generates WAL, streams to standby
> 2. Standby receives WAL, marks segments as .done immediately
> 3. Standby deletes WAL during checkpoints
> 4. Primary hasn't archived yet (archiver lag, network issues, etc.)
> 5. Primary vanishes
> 6. Standby gets promoted
> 7. WAL history lost from archive
>
> This is particularly problematic in synchronous replication where
> promotion might happen while the primary is still catching up on archival.
>
> Promoted standby might have some WALs from walreceiver, some from archive. In
> this case we need to archive only those WALs which were received, but not
> confirmed to be archived by primary.
>
> ## Proposed Solution
>
> Add archive_mode=follow_primary, where standbys defer WAL deletion until
> the primary confirms archival:
Can't we achieve nearly the same behavior by setting archive_mode to
always and configuring archive_command on the standby to check
whether the WAL file already exists in the shared archive area
(e.g., test -f <archive directory>/%f (probably also the WAL file size
should be checked))? In this setup, archive_command would fail
until the WAL file appears in the archive, preventing the standby
from removing it while the command is failing.
Regards,
--
Fujii Masao
| From | Date | Subject | |
|---|---|---|---|
| Next Message | shveta malik | 2025-10-27 05:36:21 | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
| Previous Message | jian he | 2025-10-27 05:02:24 | Re: Docs and tests for RLS policies applied by command type |