Quick Links

Re: pg_rewind does not rewind diverging timelines

From:	Mats Kindahl <mats(dot)kindahl(at)gmail(dot)com>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: pg_rewind does not rewind diverging timelines
Date:	2026-05-01 16:06:20
Message-ID:	CAN305gC0VE8zB=guccMj-7cJTW4oOAmTYCktaUKSzyOup=HHEw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Apr 30, 2026 at 10:19 AM Mats Kindahl <mats(dot)kindahl(at)gmail(dot)com>
wrote:

> Hi all,
>
> I have been playing around with various promotion scenarios to check if it
> is possible to lose writes in more complicated scenarios involving
> promotions and uses of synchronous_standby_names and decided to create a
> TLA+ model for streaming replication involving promotions and check those
> with TLC. You can find the models at [1] if you're interested.
>
> There is one scenario that I assume is known that TLC found, but does not
> seem to be fixed. It is a relatively rare case, but since the fix is quite
> easy, I thought I'd share it with you and get feedback.
>
> The scenario can occur if you're unlucky and have more than one crash when
> promoting standbys to be primaries, and goes like this:
>
> You have three servers, S1, S2, and S3. S1 is primary and S2 and S3 are
> standbys. All are on timeline (TLI) 1.
>
> 1. S1 crashes
> 2. S1 recovers and starts promotion. It writes XLOG_END_OF_RECOVERY (EOR)
> for TLI 2 to the WAL.
> 3. S1 It manages to write some records W1 to the WAL.
> 4. Before the EOR is replicated to any standby, S1 crashes again. It is
> now on TLI 2 and has some changes that are not elsewhere.
> 5. S2 is promoted. It writes an EOR for TLI 2 (since it is not aware of
> any other timeline) to the WAL.
> 6. S2 writes some records W2 to WAL and now S1 has a record of TLI 2
> version 1 (TLI 2.1) and S2 is on TLI 2.2.
> 7. S1 recovers and wants to join as a standby. You run pg_rewind to get
> rid of the extra data, but since S2 is also on TLI 2, pg_rewind will
> happily assume that both are on the same timeline.
> 8. S2 is now a standby but has that extra record for W2 both in the WAL
> and in the database.
>
> The fix (see attached draft) is quite simple: add a UUID to the EOR and to
> the history file. When comparing timelines, don't only check the TLI, also
> check the UUID. If not both match, go back further until you find a
> timeline where both the TLI and the timeline UUID matches and do the usual
> fandango to find the good LSN to rewind to.
>
> [1]: https://github.com/mkindahl/tla-postgres
>

Here is an updated version of the patch. It seems like it is not necessary
to extend the XLOG_END_OF_RECOVERY record with the UUID, just the history
files. The scenario is still the same though, and can trigger diverging
servers, possibly silent. I have an additional test case using a divergence
going back three promotions.
--
Best wishes,
Mats Kindahl, Multigres Developer, Supabase

Attachment	Content-Type	Size
v2.0002-pg_rewind-test-rewind-across-UUID-mismatched-TLI.patch	text/x-patch	6.6 KB
v2.0001-pg_rewind-use-UUIDs-to-detect-independent-same-TLI-p.patch	text/x-patch	28.0 KB

In response to

pg_rewind does not rewind diverging timelines at 2026-04-30 08:19:21 from Mats Kindahl

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2026-05-01 16:40:54	Re: Refactor: allow pg_strncoll(), etc., to accept -1 length for NUL-terminated cstrings.
Previous Message	Soumya S Murali	2026-05-01 15:59:21	Re: CREATE OR REPLACE MATERIALIZED VIEW