Re: BUG: Former primary node might stuck when started as a standby

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Aleksander Alekseev <aleksander(at)timescale(dot)com>
Subject: Re: BUG: Former primary node might stuck when started as a standby
Date: 2026-03-04 08:00:00
Message-ID: 9d4abbe2-95aa-47e0-9ce2-842196a662a7@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Michael,

04.03.2026 07:31, Michael Paquier wrote
>> I guess so. cluster::stop does the `pg_ctl stop -m fast` command. In this case
>> the walsender waits till there are nothing to be sent, see WalSndLoop().
>> Do let me know if you have observed the similar failure here.
> Exactly. Doing a clean stop of the primary offers a strong guarantee
> here. We are sure that the standby will have received all the records
> from the primary. Timeline forking is an impossible thing in
> 012_subtransactions.pl based on how the switchover from the primary to
> the standby happens. I don't see a need for tweaking this test at
> all. Or perhaps you did see a failure of some kind in this test,
> Alexander?

Yes, 012_subtransactions doesn't fail with aggressive bgwriter, as I noted
before. I mentioned it exactly to show that stop does matter here. But if
we recognize teardown_node in this context as risky, maybe it would make
sense to review also other tests in recovery/. I already wrote about
004_timeline_switch, but probably there are more. E.g., 028_pitr_timelines
(I haven't tested it intensively yet) does:
$node_primary->stop('immediate');

# Promote the standby, and switch WAL so that it archives a WAL segment
# that contains all the INSERTs, on a new timeline.
$node_standby->promote;

Best regards,
Alexander

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message paul.bunn 2026-03-04 08:00:39 [BUG + PATCH] DSA pagemap out-of-bounds in make_new_segment odd-sized path
Previous Message Anthonin Bonnefoy 2026-03-04 07:38:24 Re: Don't keep closed WAL segment in page cache after replay