Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery

From: Marco Nenciarini <marco(dot)nenciarini(at)enterprisedb(dot)com>
To: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery
Date: 2026-03-17 09:31:23
Message-ID: CA+nrD2cVZ2YdfQpk_qwFUzmkR4N5_8H9yL3NVodAmTq3gNDVpg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Since this bug dates back to 9.3, the fix will likely need backpatching.
The v2 patch changes the walrcv_identify_system() signature, which would
be an ABI break on stable branches (walrcv_identify_system_fn is a
function pointer in the WalReceiverFunctionsType struct).

Attached is a backpatch-compatible variant that avoids the API change.
Instead of adding a parameter, libpqrcv_identify_system() stores the
flush position in a new global variable (WalRcvIdentifySystemLsn), and
the walreceiver reads it directly. The fix logic and TAP test are
otherwise identical.

For master I'd still prefer the v2 approach with the extended signature,
since it's cleaner and there's no ABI constraint.

Best regards,
Marco

Attachment Content-Type Size
v2-backpatch-0001-Fix-cascading-standby-reconnect-failure-after-archiv.patch text/x-patch 12.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2026-03-17 09:48:46 Re: Propagate XLogFindNextRecord error to callers
Previous Message Amit Kapila 2026-03-17 09:25:53 Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication