| From: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
|---|---|
| To: | Marco Nenciarini <marco(dot)nenciarini(at)enterprisedb(dot)com> |
| Cc: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: BUG: Cascading standby fails to reconnect after falling back to archive recovery |
| Date: | 2026-03-17 11:36:25 |
| Message-ID: | CABPTF7X6pZPhmD0d=Okew4b+XtK3QVHOEZKjxZNnYdkDOL3f_w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Mar 17, 2026 at 4:13 PM Marco Nenciarini
<marco(dot)nenciarini(at)enterprisedb(dot)com> wrote:
>
> I agree, a standalone test file is the right call here.
>
> I looked at the same candidates. 025_stuck_on_old_timeline.pl is the
> closest thematic match, but its archive command intentionally copies
> only history files and the whole test revolves around promotion and
> timeline following. Adapting it would mean replacing the archive
> command and skipping the promotion, which defeats its original purpose.
>
> The reconnect-after-archive-fallback scenario is distinct enough to
> justify its own file, and at 143 lines it's reasonably small.
>
> Best regards,
> Marco
I’ve applied the patch and verified the fix using the two scripts you
provided earlier, as well as the failing test from v1 provided by
Fujii-san. I’ve also made some small improvements to the TAP test:
1) Added a positive synchronization point using wait_for_event() on
walreceiver / WalReceiverUpstreamCatchup, so the test now proves it
enters the reconnect-behind-upstream window before asserting outcomes.
2) Replaced broad log scanning with a scoped log window:
- capture logfile offset after rotation
- use slurp_file(..., $offset) for post-restart assertions only
- assert absence of the old “requested starting point … ahead of the
WAL flush position” error in that bounded window.
Please check it.
--
Best,
Xuneng
| Attachment | Content-Type | Size |
|---|---|---|
| v3-0001-Fix-cascading-standby-reconnect-failure-after-arc.patch | application/octet-stream | 15.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Heikki Linnakangas | 2026-03-17 11:45:20 | Re: Changing the state of data checksums in a running cluster |
| Previous Message | shveta malik | 2026-03-17 11:01:43 | Re: Skipping schema changes in publication |