| From: | Michael Paquier <michael(at)paquier(dot)xyz> |
|---|---|
| To: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
| Cc: | 'Alexander Lakhin' <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Aleksander Alekseev <aleksander(at)timescale(dot)com> |
| Subject: | Re: BUG: Former primary node might stuck when started as a standby |
| Date: | 2026-03-04 06:29:35 |
| Message-ID: | aafRT3EJOS274tw1@paquier.xyz |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Mar 04, 2026 at 02:31:29PM +0900, Michael Paquier wrote:
> As a whole, it looks like we should just switch the teardown() call to
> a stop() call in the first test with xact_009_10, backpatch it, and
> call it a day. No need for injection points and no need for GUC
> tweaks.
With a little bit more patience, I have reproduced the same failure as
Alexander using the bgwriter trick, -DWAL_DEBUG and his reproducer
script with parallel runs of the 009 recovery test. The attached
patch is also proving to work. The failure happens at the 2nd~3rd
iteration without the fix, and the tests are able to last more than 50
iterations with the fix.
As far as I can see by scanning the history of the test, this is a
copy-pasto coming from 30820982b295 where the tests were initially
introduced, where teardown_node() was copied across the test
sequences. As we want to check that a promoted standby is able to
commit the 2PC transactions issued on the primary, a plain stop() will
equally work.
I'll push this fix shortly, taking care of one instability. Nice
investigation on this one, Alexander, by the way.
--
Michael
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2026-03-04 06:30:58 | Re: BUG: Former primary node might stuck when started as a standby |
| Previous Message | Michael Paquier | 2026-03-04 06:19:04 | Re: Add expressions to pg_restore_extended_stats() |