Re: Any better way to ensure WAL continuity over failovers ?

From: Mohit Mishra <mohitmishra786687(at)gmail(dot)com>
To: harinath kanchu <kanchuharinath(at)gmail(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Any better way to ensure WAL continuity over failovers ?
Date: 2025-03-11 17:15:28
Message-ID: CAGB=j7q-_iFZ=TMhzWYNKqt6VY3fbKJhEguwSw3rRcHqo+8_gA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hello Harinath,

Do you have any status of what happened later you did that change?

It will help understand your context better.

Br,
Mohit

On Tue, 11 Mar 2025, 07:22 harinath kanchu, <kanchuharinath(at)gmail(dot)com>
wrote:

> Hello Postgres Admins,
>
> We have a PostgreSQL setup with a primary and 2 standby nodes, with
> synchronous_commit on and sync_standby_names = ANY 1(standby1,
> standby2).
>
> Assuming we use 'archive_mode = on' on all replicas, in the event of
> a primary failure and subsequent failover to a standby node, there is
> a risk of the primary having failed to upload some WALs to archive
> storage before the failover.
>
> On the other hand, setting 'archive_mode = always' ensures WAL
> retention on standbys but introduces additional inefficiencies - this
> would likely require trying to upload each WAL segment from each
> standby, or at the very least making additional calls to check for WAL
> presence in the archive. This can be costly and may not scale well.
>
> Is there a recommended and optimal way to ensure all WAL files are
> always uploaded, even across failovers?
>
> Here's how we ended up addressing this: we set 'archive_mode =
> always' on all replicas, but in our archive_command on standby nodes,
> we connect to the primary server and query `pg_stat_archiver` to
> determine the last successfully archived WAL file. Based on this
> information, the we either returns an exit code 0 (if primary confirms
> WAL was archived) or a non-zero value (primary has not, or fails to
> respond).
>
> This enables us to retain all WALs on the standby that have not so far
> been archived by the primary, and prevents redundant uploads - but it
> involves additional communication between standbys and the primary.
>
> My idea for a more robust solution is to include the last
> successfully archived WAL file information in the keep-alive messages
> sent from the primary to standby nodes. This would allow standby
> servers to use this data directly, eliminating the need for extra
> queries to the primary.
>
> If the community finds this approach useful, I would be interested in
> contributing a patch to implement this improvement. Please let us know
> your thoughts!
>
> Thanks,
>
> Best,
> Harinath
>
>
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Laurenz Albe 2025-03-11 21:43:12 Re: On-disk postgres database size too large
Previous Message Maxwell Dreytser 2025-03-11 17:01:52 On-disk postgres database size too large