| From: | cca5507 <cca5507(at)qq(dot)com> |
|---|---|
| To: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Bug? pg_rewind produces unusable but starting database with standby recovery |
| Date: | 2026-06-11 03:16:03 |
| Message-ID: | tencent_97353C828324BBD950FD4283DAE314379907@qq.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
> 2. Patch 0002 (remote source WAL race) I think goes the wrong way.
>
> Moving the consistency LSN capture before collecting the file list
> breaks the rule that backup stop relies on: the consistency point must
> be established AFTER the data pages are read, never before. The source
> is a live server and pg_rewind reads pages from disk one by one with
> pg_read_binary_file(). A page modified on the source after we captured
> flush LSN, but before we copy it, lands on the target with an LSN
> greater than minRecoveryPoint. The target can then reach minRecoveryPoint
> and be declared consistent while holding pages from the future whose WAL
> was never replayed - a torn, cross-page inconsistent state.
>
> The good half of the patch is insert LSN -> flush LSN. Note that
> pg_rewind reads pages from disk, and by the WAL-before-data rule any
> on-disk page has LSN <= flush LSN. So flush LSN is the tight upper bound
> on what we can copy: insert LSN overshoots (may point past any copied
> page, possibly into WAL still only in shared memory), flush-before-copy
> undershoots (torn pages), and flush-after-copy is exactly right: it
> covers every copied page and is durable.
>
> So I think the fix should keep the capture where it was (end of
> perform_rewind, after the copy) and only switch insert LSN to flush LSN.
> Missing WAL up to minRecoveryPoint is then handled like a base backup:
> the target fetches it via streaming or restore_command, and patch 0001
> turns the unreachable case into a clean FATAL instead of silent
> corruption. This also means the 012_remote_wal_race test, which checks
> that the copied pg_wal alone covers minRecoveryPoint, encodes the
> questionable assumption and would need to be reworked.
+1. I reported a related bug and wrote a patch doing "insert LSN -> flush LSN" in:
https://commitfest.postgresql.org/patch/6679/
--
Regards,
ChangAo Chen
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2026-06-11 03:31:26 | pg_restore handles extended statistics inconsistently with statistics data |
| Previous Message | cca5507 | 2026-06-11 03:03:17 | Re: [BUG] Take a long time to reach consistent after pg_rewind |