Re: BUG: Former primary node might stuck when started as a standby

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: 'Alexander Lakhin' <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Aleksander Alekseev <aleksander(at)timescale(dot)com>
Subject: Re: BUG: Former primary node might stuck when started as a standby
Date: 2026-03-04 06:29:35
Message-ID: aafRT3EJOS274tw1@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 04, 2026 at 02:31:29PM +0900, Michael Paquier wrote:
> As a whole, it looks like we should just switch the teardown() call to
> a stop() call in the first test with xact_009_10, backpatch it, and
> call it a day. No need for injection points and no need for GUC
> tweaks.

With a little bit more patience, I have reproduced the same failure as
Alexander using the bgwriter trick, -DWAL_DEBUG and his reproducer
script with parallel runs of the 009 recovery test. The attached
patch is also proving to work. The failure happens at the 2nd~3rd
iteration without the fix, and the tests are able to last more than 50
iterations with the fix.

As far as I can see by scanning the history of the test, this is a
copy-pasto coming from 30820982b295 where the tests were initially
introduced, where teardown_node() was copied across the test
sequences. As we want to check that a promoted standby is able to
commit the 2PC transactions issued on the primary, a plain stop() will
equally work.

I'll push this fix shortly, taking care of one instability. Nice
investigation on this one, Alexander, by the way.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2026-03-04 06:30:58 Re: BUG: Former primary node might stuck when started as a standby
Previous Message Michael Paquier 2026-03-04 06:19:04 Re: Add expressions to pg_restore_extended_stats()