Quick Links

Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery

From:	Marco Nenciarini <marco(dot)nenciarini(at)enterprisedb(dot)com>
To:	Xuneng Zhou <xunengzhou(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery
Date:	2026-03-16 21:49:44
Message-ID:	CA+nrD2dRNzWAxc227uqy5tdFEk-UmK7R5965GYL9yzLzP+g6+Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Attached is a v2 patch that implements the "handshake clamp" approach
Xuneng suggested. Rather than tracking lastStreamedFlush in
process-local state (which doesn't survive a cascade restart, as
Fujii-san demonstrated), it uses the WAL flush position already
returned by IDENTIFY_SYSTEM.

The walreceiver now checks the upstream's flush position before issuing
START_REPLICATION. If the requested startpoint is ahead (on the same
timeline), it waits for wal_retrieve_retry_interval and retries. This
works across restarts since it queries the upstream's live position on
every connection attempt, and requires no new state variables.

When timelines differ, we let START_REPLICATION handle the timeline
negotiation as before.

The patch includes a TAP test (053_cascade_reconnect.pl) that
reproduces the scenario and verifies the fix.

Attachment	Content-Type	Size
v2-0001-Fix-cascading-standby-reconnect-failure-after-arc.patch	text/x-patch	15.0 KB

In response to

Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery at 2026-02-02 02:16:56 from Xuneng Zhou

Responses

Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery at 2026-03-17 01:04:16 from Xuneng Zhou
Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery at 2026-03-17 09:31:23 from Marco Nenciarini

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Matthias van de Meent	2026-03-16 21:50:04	Re: Adding REPACK [concurrently]
Previous Message	Michael Paquier	2026-03-16 21:49:17	Re: Add starelid, attnum to pg_stats and leverage this in pg_dump