| From: | Marco Nenciarini <marco(dot)nenciarini(at)enterprisedb(dot)com> |
|---|---|
| To: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
| Cc: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery |
| Date: | 2026-03-20 11:45:02 |
| Message-ID: | CA+nrD2ePAV35aMF_fz5qEbfaJf8aZJHwS4-mY6KaZWH_KBg9rw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Xuneng,
On Fri, Mar 20, 2026 at 1:52 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> The one-segment bound holds for the case where both nodes have
> replayed exactly the same WAL -- the gap comes from
> RequestXLogStreaming truncating recptr to the segment boundary, so
> startpoint is always at the start of the next segment while
> GetStandbyFlushRecPtr returns replayPtr within the current one. I
> think that part of the analysis is correct.
Right, I verified it: RequestXLogStreaming (walreceiverfuncs.c)
explicitly truncates recptr to the segment start to avoid creating
broken partial segments. So the cascade always asks to stream from
the beginning of a segment, while GetStandbyFlushRecPtr on the
upstream returns replayPtr somewhere inside that same segment. The
gap is the distance from the segment start to replayPtr, always less
than wal_segment_size.
> But the gap can legitimately be multiple segments. Consider: the
> upstream standby goes down (or is restarted for maintenance) while the
> primary keeps generating and archiving WAL.
Agreed, that scenario produces a multi-segment gap. But I think
handling it is out of scope for this patch. The bug we're fixing is
that a cascade can never start streaming from an upstream that is
fully caught up, because of the RequestXLogStreaming truncation.
Making the walreceiver wait for an upstream that is genuinely many
segments behind would be a feature, not a bug fix, and it would need
its own discussion about the right behavior.
The wal_segment_size threshold keeps the fix narrowly targeted at
this specific bug: absorb the sub-segment gap that arises from the
truncation, let everything else fail as before.
> If there's a consensus for this and the fix of one-segment gap, the
> current tap test would become non-deterministic.
Good catch. I'll tighten the test to make sure the gap stays within
one segment.
> I think the difference is that -- during normal streaming,
> wal_receiver_timeout will eventually fire and kill the connection,
> whereas the catch-up polling loop has no such timeout.
Fair point. I'll add a wal_receiver_timeout check to the polling
loop so the walreceiver exits if it has been waiting too long, same
as it would during normal streaming.
Updated patches attached.
Best regards,
Marco
| Attachment | Content-Type | Size |
|---|---|---|
| v5-0001-Fix-cascading-standby-reconnect-failure-after-arc.patch | text/x-patch | 17.9 KB |
| v5-backpatch-0001-Fix-cascading-standby-reconnect-failure-after-arc.patch | text/x-patch | 16.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andrey Borodin | 2026-03-20 11:55:52 | Re: Bug in MultiXact replay compat logic for older minor version after crash-recovery |
| Previous Message | 2026-03-20 11:37:55 | RE: [Proposal] Adding Log File Capability to pg_createsubscriber |