Quick Links

Re: pg_rewind does not rewind diverging timelines

From:	surya poondla <suryapoondla4(at)gmail(dot)com>
To:	Mats Kindahl <mats(dot)kindahl(at)gmail(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: pg_rewind does not rewind diverging timelines
Date:	2026-05-21 22:09:05
Message-ID:	CAOVWO5oZZtniLR4Pyd=e_cS-FNxh837Gbz9TDUnSwWqmbap=bw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi Mats,

Thanks for picking this up -- the scenario is a real one and I think the
UUID-tagging approach is a clean way to solve it. v2 applies and builds
without trouble, and the core algorithm reads well to me.
I have a handful of observations that I'd love your thoughts.

Regarding Correctness I have the below thoughts

1. UUIDv7 timestamp epoch.
In StartupXLOG():
TimestampTz now = GetCurrentTimestamp();
generate_uuidv7_r(&uuid_buf, (uint64)(now / 1000),
(uint32)(now % 1000) * 1000);

I think there might be a small mismatch here: GetCurrentTimestamp() returns
microseconds since the Postgres epoch (2000-01-01),
whereas generate_uuidv7_r describes its first argument as milliseconds
since the Unix epoch.
As written that 30-year offset would land in the UUID's timestamp field, so
the resulting UUID wouldn't be a conformant UUIDv7 and wouldn't
time-order against UUIDv7s generated through the SQL functions.

Uniqueness is preserved either way, so the rewind logic still works as
intended but it seemed worth flagging.

I see conversion that's used elsewhere as:
us = ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE)
* SECS_PER_DAY * USECS_PER_SEC;

Or, since promotion isn't on a hot path, gettimeofday() / time(NULL)
directly would also be fine.

2. EOR-record path, the intent is unclear.

The comment above generate_uuidv7_r() at says:

"The same UUID is written into the history file and later into the
XLOG_END_OF_RECOVERY record so that pg_rewind can distinguish two
servers..."

But from what I can see only the history-file part actually lands.
xl_end_of_recovery is unchanged, CreateEndOfRecoveryRecord() doesn't add
the UUID, and XLogCtl->ThisTimeLineUUID is written under info_lck without a
reader (I couldn't grep it).

The xlog_redo() memset() + Min(rec_len, sizeof(...)) change reads like
preparation for an EOR-struct extension that ended up not being part of the
patch.

Was the EOR-record piece something you intended to keep for a follow-up, or
has it been superseded by the history-file approach?

3. Malformed UUID handling in readTimeLineHistory().

The optional field-4 path is:

if (nfields == 4 && strlen(uuid_str) == UUID_STR_LEN)
{
Datum datum = DirectFunctionCall1(uuid_in,
CStringGetDatum(uuid_str));
...
}

uuid_in() raises ereport(ERROR) on a malformed input, while the surrounding
syntax-error paths in readTimeLineHistory() use FATAL deliberately.
In practice an ERROR during startup ends up being fatal too, so this isn't
strictly a bug but it would be nicer to stay consistent.

Regarding the Tests I have the following thoughts

The two new cases are nice, a few extensions that I think would strengthen
them:
1. A mixed-version case where one side has a zero UUID. That's the path
we're claiming is graceful, but nothing currently exercises it
2. A deeper-divergence case (e.g. TLI1->2->3 vs TLI1->2->3') so that
findCommonAncestorTimeline's loop walks past matching entries
before hitting the mismatch. The 0002 test puts the divergence at
depth 1.
3. A small assertion against the on-disk 00000002.history contents, to pin
down the file format.
4. On 0002 the dependency on restore_command pointing at node_x's pg_wal is
the kind of thing that tends to break under
environment changes. A CHECKPOINT on node_x before the backup, or
wal_keep_size as in 0001, would let the test stand on its own.

I'm happy to keep reviewing/contributing, thanks again for working on it.

Regards,
Surya Poondla

In response to

pg_rewind does not rewind diverging timelines at 2026-04-30 08:19:21 from Mats Kindahl

Browse pgsql-hackers by date

	From	Date	Subject
Previous Message	Dave Cramer	2026-05-21 21:55:35	Re: Patch for bind message regarding the number of parameters and result column format codes