Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Marco Nenciarini <marco(dot)nenciarini(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery
Date: 2026-03-21 10:07:16
Message-ID: CABPTF7XBtq165xQjTYG7+-Uv+A5CqtdBwPUnA_crm2duN=PPrg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 20, 2026 at 8:40 PM Marco Nenciarini
<marco(dot)nenciarini(at)enterprisedb(dot)com> wrote:
>
> On Fri, Mar 20, 2026 at 4:33 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> >
> > After taking a closer look, I'm less certain about this. I'll
> > investigate further. Could you also explain why you think this is the
> > case?
>
> The mechanism is in RequestXLogStreaming (walreceiverfuncs.c, around
> line 276): it explicitly truncates recptr to the segment start before
> passing it to the walreceiver. So even when both nodes have replayed
> the same records, the cascade's startpoint lands at the beginning of
> the next segment while the upstream's GetStandbyFlushRecPtr returns
> replayPtr somewhere inside the current one.
>
> I covered this in more detail in my reply to your previous message.
>
> Best regards,
> Marco
>

if (XLogSegmentOffset(recptr, wal_segment_size) != 0)
recptr -= XLogSegmentOffset(recptr, wal_segment_size);

I think this rounds down to the start of segment that contains
targetPagePtr + reqLen. It does not round up. So if both standby have
replayed the same record, the cascade's startpoint would land at the
beginning of the current segment, which will provide a legitimate LSN
to upstream server. This case would be fine. Am I missing something
here?

--
Best,
Xuneng

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Xuneng Zhou 2026-03-21 10:29:15 Re: tablecmds: fix bug where index rebuild loses replica identity on partitions
Previous Message Alexander Lakhin 2026-03-21 10:00:00 Re: SQL Property Graph Queries (SQL/PGQ)