| From: | Marco Nenciarini <marco(dot)nenciarini(at)enterprisedb(dot)com> |
|---|---|
| To: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
| Cc: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery |
| Date: | 2026-03-17 09:31:23 |
| Message-ID: | CA+nrD2cVZ2YdfQpk_qwFUzmkR4N5_8H9yL3NVodAmTq3gNDVpg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Since this bug dates back to 9.3, the fix will likely need backpatching.
The v2 patch changes the walrcv_identify_system() signature, which would
be an ABI break on stable branches (walrcv_identify_system_fn is a
function pointer in the WalReceiverFunctionsType struct).
Attached is a backpatch-compatible variant that avoids the API change.
Instead of adding a parameter, libpqrcv_identify_system() stores the
flush position in a new global variable (WalRcvIdentifySystemLsn), and
the walreceiver reads it directly. The fix logic and TAP test are
otherwise identical.
For master I'd still prefer the v2 approach with the extended signature,
since it's cleaner and there's no ABI constraint.
Best regards,
Marco
| Attachment | Content-Type | Size |
|---|---|---|
| v2-backpatch-0001-Fix-cascading-standby-reconnect-failure-after-archiv.patch | text/x-patch | 12.7 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Fujii Masao | 2026-03-17 09:48:46 | Re: Propagate XLogFindNextRecord error to callers |
| Previous Message | Amit Kapila | 2026-03-17 09:25:53 | Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication |