Quick Links

Re: BUG: Former primary node might stuck when started as a standby

From:	Alexander Lakhin <exclusion(at)gmail(dot)com>
To:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Aleksander Alekseev <aleksander(at)timescale(dot)com>
Subject:	Re: BUG: Former primary node might stuck when started as a standby
Date:	2026-02-20 02:00:00
Message-ID:	045cab6f-4738-417e-b551-01adba44d6c3@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Dear Kuroda-san,

19.02.2026 05:50, Hayato Kuroda (Fujitsu) wrote:
> Dear Alexander,
>
>> Unfortunately, the testing procedure I shared above still produces failures
>> with the patched 009_twophase.pl.
> Hmm, I ran the test for hours, but I could nor reproduce the failure. But let me analyze
> based on your log.

Please look at the attached self-contained script. It works for me (failed
on iterations 6, 12, 2 right now, on my workstation with Ryzen 7900X) --
probably you could adjust number of parallel jobs to reproduce it on your
hardware.

> I have few experience to see the wal_debug output, but background writer seems to
> generate the RUNNING_XACTS record. It's different from my expectation. To confirm,
> did you really enable the injection point? For now 009_twophase can work without
> the `-Dinjection_points=true` but it should be set to avoid random failures.

I think it failed before the injection was set. My log contains:
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT: PREPARE TRANSACTION 'xact_009_10';
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG: xlog flush request 0/030227F8; write
0/00000000; flush 0/00000000
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT: PREPARE TRANSACTION 'xact_009_10';
2026-02-17 07:06:44.313 EET background writer[754333] LOG: INSERT @ 0/03022838: - Standby/RUNNING_XACTS: nextXid 791
latestCompletedXid 788 oldestRunningXid 789; 1 xacts: 789; 1 subxacts: 790

As far as I can see, it corresponds to this place in the test:
     SAVEPOINT s1;
     INSERT INTO t_009_tbl VALUES (22, 'issued to ${cur_primary_name}');
     PREPARE TRANSACTION 'xact_009_10';");
+$cur_primary->wait_for_replay_catchup($cur_standby);
$cur_primary->teardown_node;
$cur_standby->promote;

And as we found out before, wait_for_replay_catchup() before teardown
doesn't help... I can't say for sure, but from my experiments, the test
didn't fail with $cur_primary->stop instead of $cur_primary->teardown_node.

Best regards,
Alexander

Attachment	Content-Type	Size
promote-issue-repro.sh.txt	text/plain	1.3 KB

In response to

RE: BUG: Former primary node might stuck when started as a standby at 2026-02-19 03:50:02 from Hayato Kuroda (Fujitsu)

Responses

Re: BUG: Former primary node might stuck when started as a standby at 2026-03-02 07:00:00 from Alexander Lakhin

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Steele	2026-02-20 03:10:34	Re: Return pg_control from pg_backup_stop().
Previous Message	Peter Smith	2026-02-20 00:28:56	Re: Skipping schema changes in publication