Re: pg_rewind does not rewind diverging timelines

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Mats Kindahl <mats(dot)kindahl(at)gmail(dot)com>
Cc: pgsql-hackers mailing list <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_rewind does not rewind diverging timelines
Date: 2026-06-08 10:48:27
Message-ID: 80D76C66-A953-466C-8295-A1CF8365A4D2@yandex-team.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 30 Apr 2026, at 13:19, Mats Kindahl <mats(dot)kindahl(at)gmail(dot)com> wrote:
>
> There is one scenario that I assume is known that TLC found, but does not seem to be fixed. It is a relatively rare case, but since the fix is quite easy, I thought I'd share it with you and get feedback.

Hi Mats,

Thanks for working on this. I think the problem is real, but I wonder if
adding a separate UUID to timeline history files is solving it one step
too late.

If two independent promotions manage to choose the same numeric TLI, then
we already have two different histories with the same timeline identifier.
Their history files will also have the same name. A UUID in the file lets
tools detect the mismatch afterwards, but it does not prevent the archive
namespace from containing two different meanings for the same TLI.

In normal deployments with a shared archive this should only be possible
when the history file is not visible to the other promoting server:
either there is no usable restore_command/shared archive, or there is a
race around publishing and observing the history file. In other words, TLI
allocation is not atomic, but it is intended to be coordinated through the
archive.

Maybe we should keep TimelineID as the actual branch identifier and make
that allocation harder to collide instead of adding a second identifier.
For example, when choosing a new TLI, add some randomness rather than just
using the next sequential value. That would make the race window much less
dangerous: two independent promotions would be extremely unlikely to
choose the same TLI, the history file names would remain distinct, and TLI
would keep its current role as the timeline identifier.

This also keeps the operational model simpler. TimelineID is already the
identifier exposed in WAL file names, history file names, logs, and
recovery configuration. If we add UUIDs, we effectively introduce another
identity for the same object, and tools then need to reason about both.
If instead we make TLI allocation less deterministic under races, the
existing model remains intact.

Does that framing make sense, or am I missing a case where duplicate TLIs
are unavoidable even with a shared archive and a less collision-prone
allocation scheme?

Best regards, Andrey Borodin.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2026-06-08 11:04:47 Re: Subquery pull-up increases jointree search space
Previous Message Nazir Bilal Yavuz 2026-06-08 10:30:03 Re: ci: CCache churns through available space too quickly