Quick Links

Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery

From:	Marco Nenciarini <marco(dot)nenciarini(at)enterprisedb(dot)com>
To:	Xuneng Zhou <xunengzhou(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery
Date:	2026-03-18 09:49:25
Message-ID:	CA+nrD2eJUfLq8_Ed7fv-7LrmkOoLJ28LwAHh-Rjjg4RU9KOYCg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Here are the v4 patches implementing what I described above.

On top of Xuneng's v3 (keeping the wait_for_event and scoped log
window test improvements), the main changes are:

- The wait is now capped at one wal_segment_size. If the gap is
larger, we skip the wait and let START_REPLICATION fail normally
so the startup process can fall back to archive. This avoids
indefinite polling when the upstream is fundamentally behind.

- The first "ahead of flush position" message is logged at LOG,
subsequent ones at DEBUG1, to cut down on noise during a long wait.

Two patches attached: v4-0001 for master (extends the
walrcv_identify_system API with an optional server_lsn output
parameter) and v4-backpatch-0001 for stable branches (uses a global
variable to preserve ABI, per Alvaro's suggestion).

Both pass the new TAP test.

Best regards,
Marco

Attachment	Content-Type	Size
v4-backpatch-0001-Fix-cascading-standby-reconnect-failure-after-arc.patch	text/x-patch	14.8 KB
v4-0001-Fix-cascading-standby-reconnect-failure-after-arc.patch	text/x-patch	16.6 KB

In response to

Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery at 2026-03-18 08:33:47 from Marco Nenciarini

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ashutosh Bapat	2026-03-18 10:00:11	SQL/PGQ: All properties reference
Previous Message	Álvaro Herrera	2026-03-18 09:39:34	Re: [19] CREATE SUBSCRIPTION ... SERVER