| From: | Marco Nenciarini <marco(dot)nenciarini(at)enterprisedb(dot)com> |
|---|---|
| To: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
| Cc: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery |
| Date: | 2026-04-27 16:50:28 |
| Message-ID: | CA+nrD2cc9nvrT5_OnQ_PGfFaL9nMTzeA6zwQ+u9a10hexwJaXg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
v7 patches attached. No code changes from v6, just rebased on
current master to remove minor offset, and the backpatch file is
renamed with a "nocfbot-" prefix so the commitfest bot picks up
only the master patch.
On Mon, Apr 27, 2026 at 6:00 PM Marco Nenciarini <
marco(dot)nenciarini(at)enterprisedb(dot)com> wrote:
> Registered in PG20-1: https://commitfest.postgresql.org/patch/6716/
>
> On Sat, Mar 21, 2026 at 11:52 AM Marco Nenciarini <
> marco(dot)nenciarini(at)enterprisedb(dot)com> wrote:
>
>> Here are the v6 patches.
>>
>> Xuneng correctly pointed out that RequestXLogStreaming rounds down,
>> not up, so it isn't the cause of the gap. The actual mechanism is
>> that archive recovery processes whole segment files: after both nodes
>> replay the same archived segment N, the cascade's next read position
>> lands at the start of segment N+1, while the upstream's
>> GetStandbyFlushRecPtr returns replayPtr inside segment N.
>>
>> Changes from v5:
>>
>> - Updated the code comment and commit message to describe the correct
>> root cause (archive recovery segment granularity, not
>> RequestXLogStreaming truncation).
>>
>> - Reset the catchup state when the upstream is no longer behind.
>> Without this, if the walreceiver successfully streams, the
>> connection breaks, and it loops back to find itself ahead again,
>> the stale deadline from the previous wait would cause an immediate
>> timeout.
>>
>> Two patches attached: v6-0001 for master (extends the
>> walrcv_identify_system API) and v6-backpatch-0001 for stable branches
>> (global variable to preserve ABI).
>>
>> Best regards,
>> Marco
>>
>>
| Attachment | Content-Type | Size |
|---|---|---|
| v7-0001-Fix-cascading-standby-reconnect-failure-after-arc.patch | text/x-patch | 18.3 KB |
| nocfbot-v7-backpatch-0001-Fix-cascading-standby-reconnect-failure-after-arc.patch | text/x-patch | 16.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | SATYANARAYANA NARLAPURAM | 2026-04-27 17:02:41 | Re: [Patch]: Fix excessive ProcArrayLock acquisitions with subscription max_retention_duration=0 |
| Previous Message | Dilip Kumar | 2026-04-27 16:26:25 | Re: Proposal: Conflict log history table for Logical Replication |