| From: | Marco Nenciarini <marco(dot)nenciarini(at)enterprisedb(dot)com> |
|---|---|
| To: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
| Cc: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery |
| Date: | 2026-03-18 09:49:25 |
| Message-ID: | CA+nrD2eJUfLq8_Ed7fv-7LrmkOoLJ28LwAHh-Rjjg4RU9KOYCg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Here are the v4 patches implementing what I described above.
On top of Xuneng's v3 (keeping the wait_for_event and scoped log
window test improvements), the main changes are:
- The wait is now capped at one wal_segment_size. If the gap is
larger, we skip the wait and let START_REPLICATION fail normally
so the startup process can fall back to archive. This avoids
indefinite polling when the upstream is fundamentally behind.
- The first "ahead of flush position" message is logged at LOG,
subsequent ones at DEBUG1, to cut down on noise during a long wait.
Two patches attached: v4-0001 for master (extends the
walrcv_identify_system API with an optional server_lsn output
parameter) and v4-backpatch-0001 for stable branches (uses a global
variable to preserve ABI, per Alvaro's suggestion).
Both pass the new TAP test.
Best regards,
Marco
| Attachment | Content-Type | Size |
|---|---|---|
| v4-backpatch-0001-Fix-cascading-standby-reconnect-failure-after-arc.patch | text/x-patch | 14.8 KB |
| v4-0001-Fix-cascading-standby-reconnect-failure-after-arc.patch | text/x-patch | 16.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ashutosh Bapat | 2026-03-18 10:00:11 | SQL/PGQ: All properties reference |
| Previous Message | Álvaro Herrera | 2026-03-18 09:39:34 | Re: [19] CREATE SUBSCRIPTION ... SERVER |