From: | Mohit Mishra <mohitmishra786687(at)gmail(dot)com> |
---|---|
To: | harinath kanchu <kanchuharinath(at)gmail(dot)com> |
Cc: | pgsql-admin(at)postgresql(dot)org |
Subject: | Re: Any better way to ensure WAL continuity over failovers ? |
Date: | 2025-03-11 17:15:28 |
Message-ID: | CAGB=j7q-_iFZ=TMhzWYNKqt6VY3fbKJhEguwSw3rRcHqo+8_gA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Hello Harinath,
Do you have any status of what happened later you did that change?
It will help understand your context better.
Br,
Mohit
On Tue, 11 Mar 2025, 07:22 harinath kanchu, <kanchuharinath(at)gmail(dot)com>
wrote:
> Hello Postgres Admins,
>
> We have a PostgreSQL setup with a primary and 2 standby nodes, with
> synchronous_commit on and sync_standby_names = ANY 1(standby1,
> standby2).
>
> Assuming we use 'archive_mode = on' on all replicas, in the event of
> a primary failure and subsequent failover to a standby node, there is
> a risk of the primary having failed to upload some WALs to archive
> storage before the failover.
>
> On the other hand, setting 'archive_mode = always' ensures WAL
> retention on standbys but introduces additional inefficiencies - this
> would likely require trying to upload each WAL segment from each
> standby, or at the very least making additional calls to check for WAL
> presence in the archive. This can be costly and may not scale well.
>
> Is there a recommended and optimal way to ensure all WAL files are
> always uploaded, even across failovers?
>
> Here's how we ended up addressing this: we set 'archive_mode =
> always' on all replicas, but in our archive_command on standby nodes,
> we connect to the primary server and query `pg_stat_archiver` to
> determine the last successfully archived WAL file. Based on this
> information, the we either returns an exit code 0 (if primary confirms
> WAL was archived) or a non-zero value (primary has not, or fails to
> respond).
>
> This enables us to retain all WALs on the standby that have not so far
> been archived by the primary, and prevents redundant uploads - but it
> involves additional communication between standbys and the primary.
>
> My idea for a more robust solution is to include the last
> successfully archived WAL file information in the keep-alive messages
> sent from the primary to standby nodes. This would allow standby
> servers to use this data directly, eliminating the need for extra
> queries to the primary.
>
> If the community finds this approach useful, I would be interested in
> contributing a patch to implement this improvement. Please let us know
> your thoughts!
>
> Thanks,
>
> Best,
> Harinath
>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2025-03-11 21:43:12 | Re: On-disk postgres database size too large |
Previous Message | Maxwell Dreytser | 2025-03-11 17:01:52 | On-disk postgres database size too large |