| From: | Marco Nenciarini <marco(dot)nenciarini(at)enterprisedb(dot)com> |
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org |
| Subject: | BUG: Cascading standby fails to reconnect after falling back to archive recovery |
| Date: | 2026-01-28 17:03:24 |
| Message-ID: | CA+nrD2cTuTkkX5WXVZengTYYZbAO6zV8K+Tri-R0fbLFuoyMBA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi hackers,
I've encountered a bug in PostgreSQL's streaming replication where cascading
standbys fail to reconnect after falling back to archive recovery. The issue
occurs when the upstream standby uses archive-only recovery.
The standby requests streaming from the wrong WAL position (next segment
boundary
instead of the current position), causing connection failures with this
error:
ERROR: requested starting point 0/A000000 is ahead of the WAL flush
position of this server 0/9000000
Attached are two shell scripts that reliably reproduce the issue on
PostgreSQL
17.x and 18.x:
1. reproducer_restart_upstream_portable.sh - triggers by restarting upstream
2. reproducer_cascade_restart_portable.sh - triggers by restarting the
cascade
The scripts set up this topology:
- Primary with archiving enabled
- Standby using only archive recovery (no streaming from primary)
- Cascading standby streaming from the archive-only standby
When the cascade loses its streaming connection and falls back to archive
recovery,
it cannot reconnect. The issue appears to be in xlogrecovery.c around line
3880,
where the position passed to RequestXLogStreaming() determines which segment
boundary is requested.
The cascade restart reproducer shows that even restarting the cascade itself
triggers the bug, which affects routine maintenance operations.
Scripts require PostgreSQL binaries in PATH and use ports 15432-15434.
Best regards,
Marco
| Attachment | Content-Type | Size |
|---|---|---|
| pgsql-hackers-bug-report-final.md | text/markdown | 3.6 KB |
| reproducer_restart_upstream_portable.sh | application/x-shellscript | 4.3 KB |
| reproducer_cascade_restart_portable.sh | application/x-shellscript | 4.9 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Robert Haas | 2026-01-28 17:13:39 | Re: pg_plan_advice |
| Previous Message | Andres Freund | 2026-01-28 16:53:59 | Re: pgsql: Prevent invalidation of newly synced replication slots. |