| From: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
|---|---|
| To: | Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Bug? pg_rewind produces unusable but starting database with standby recovery |
| Date: | 2026-06-10 07:53:57 |
| Message-ID: | 05DEB465-3C99-417E-B7FA-275A28068D90@yandex-team.ru |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On 15 May 2026, at 12:50, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com> wrote:
>
> Sometimes pg_rewind can generate a state where the stated
> minRecoveryPoint is beyond the actual available wal.
Hi Zsolt,
I looked at both patches. Some notes.
1. Patch 0001 (unconditional minRecoveryPoint check) looks necessary and
safe to me.
The new check fires only when LocalMinRecoveryPoint is valid, and that
only happens when InArchiveRecovery is true. In plain crash recovery
(no backup_label, no signal files) StartupXLOG sets LocalMinRecoveryPoint
to InvalidXLogRecPtr, so "EndOfLog < LocalMinRecoveryPoint" is always
false and the check is a no-op. InArchiveRecovery becomes true only when
a backup_label is present, or when archive recovery was requested. In
all those cases minRecoveryPoint > EndOfLog really does mean we stopped
short of consistency, so erroring out is the only correct option.
The only case the old code missed is: backup_label present, but
backupEndRequired is false and no signal files. The single producer of
that combination is "BACKUP METHOD: pg_rewind" (streamed base backups
set backupEndRequired and were already caught). So the new behavior is
limited exactly to the pg_rewind hole, with no regression for streamed
backups or crash recovery.
2. Patch 0002 (remote source WAL race) I think goes the wrong way.
Moving the consistency LSN capture before collecting the file list
breaks the rule that backup stop relies on: the consistency point must
be established AFTER the data pages are read, never before. The source
is a live server and pg_rewind reads pages from disk one by one with
pg_read_binary_file(). A page modified on the source after we captured
flush LSN, but before we copy it, lands on the target with an LSN
greater than minRecoveryPoint. The target can then reach minRecoveryPoint
and be declared consistent while holding pages from the future whose WAL
was never replayed - a torn, cross-page inconsistent state.
The good half of the patch is insert LSN -> flush LSN. Note that
pg_rewind reads pages from disk, and by the WAL-before-data rule any
on-disk page has LSN <= flush LSN. So flush LSN is the tight upper bound
on what we can copy: insert LSN overshoots (may point past any copied
page, possibly into WAL still only in shared memory), flush-before-copy
undershoots (torn pages), and flush-after-copy is exactly right: it
covers every copied page and is durable.
So I think the fix should keep the capture where it was (end of
perform_rewind, after the copy) and only switch insert LSN to flush LSN.
Missing WAL up to minRecoveryPoint is then handled like a base backup:
the target fetches it via streaming or restore_command, and patch 0001
turns the unreachable case into a clean FATAL instead of silent
corruption. This also means the 012_remote_wal_race test, which checks
that the copied pg_wal alone covers minRecoveryPoint, encodes the
questionable assumption and would need to be reworked.
3. Longer term it would be nice to converge with backup stop.
pg_rewind already writes a backup_label and the target already recovers
like a backup. What is hand-rolled is the stop/consistency point: rewind
computes an LSN itself and stores it into minRecoveryPoint. Both issues
above live in that hand-rolled part. Establishing the consistency target
the same way pg_backup_stop() does (after the copy), or even driving it
from pg_backup_start()/pg_backup_stop() on the live source, would make
the consistency-after-copy invariant automatic and let StartupXLOG
enforce it uniformly.
Best regards, Andrey Borodin.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2026-06-10 08:05:01 | Re: [PATCH] Fix typos in pqsignal.c comment |
| Previous Message | Ewan Young | 2026-06-10 07:32:28 | Re: [PATCH] Fix typos in pqsignal.c comment |