Re: BUG: Former primary node might stuck when started as a standby

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Aleksander Alekseev <aleksander(at)timescale(dot)com>
Subject: Re: BUG: Former primary node might stuck when started as a standby
Date: 2026-02-20 02:00:00
Message-ID: 045cab6f-4738-417e-b551-01adba44d6c3@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Kuroda-san,

19.02.2026 05:50, Hayato Kuroda (Fujitsu) wrote:
> Dear Alexander,
>
>> Unfortunately, the testing procedure I shared above still produces failures
>> with the patched 009_twophase.pl.
> Hmm, I ran the test for hours, but I could nor reproduce the failure. But let me analyze
> based on your log.

Please look at the attached self-contained script. It works for me (failed
on iterations 6, 12, 2 right now, on my workstation with Ryzen 7900X) --
probably you could adjust number of parallel jobs to reproduce it on your
hardware.

> I have few experience to see the wal_debug output, but background writer seems to
> generate the RUNNING_XACTS record. It's different from my expectation. To confirm,
> did you really enable the injection point? For now 009_twophase can work without
> the `-Dinjection_points=true` but it should be set to avoid random failures.

I think it failed before the injection was set. My log contains:
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT:  PREPARE TRANSACTION 'xact_009_10';
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG:  xlog flush request 0/030227F8; write
0/00000000; flush 0/00000000
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT:  PREPARE TRANSACTION 'xact_009_10';
2026-02-17 07:06:44.313 EET background writer[754333] LOG:  INSERT @ 0/03022838:  - Standby/RUNNING_XACTS: nextXid 791
latestCompletedXid 788 oldestRunningXid 789; 1 xacts: 789; 1 subxacts: 790

As far as I can see, it corresponds to this place in the test:
     SAVEPOINT s1;
     INSERT INTO t_009_tbl VALUES (22, 'issued to ${cur_primary_name}');
     PREPARE TRANSACTION 'xact_009_10';");
+$cur_primary->wait_for_replay_catchup($cur_standby);
 $cur_primary->teardown_node;
 $cur_standby->promote;

And as we found out before, wait_for_replay_catchup() before teardown
doesn't help... I can't say for sure, but from my experiments, the test
didn't fail with $cur_primary->stop instead of $cur_primary->teardown_node.

Best regards,
Alexander

Attachment Content-Type Size
promote-issue-repro.sh.txt text/plain 1.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2026-02-20 03:10:34 Re: Return pg_control from pg_backup_stop().
Previous Message Peter Smith 2026-02-20 00:28:56 Re: Skipping schema changes in publication