| From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
|---|---|
| To: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
| Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Aleksander Alekseev <aleksander(at)timescale(dot)com> |
| Subject: | Re: BUG: Former primary node might stuck when started as a standby |
| Date: | 2026-02-20 02:00:00 |
| Message-ID: | 045cab6f-4738-417e-b551-01adba44d6c3@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Dear Kuroda-san,
19.02.2026 05:50, Hayato Kuroda (Fujitsu) wrote:
> Dear Alexander,
>
>> Unfortunately, the testing procedure I shared above still produces failures
>> with the patched 009_twophase.pl.
> Hmm, I ran the test for hours, but I could nor reproduce the failure. But let me analyze
> based on your log.
Please look at the attached self-contained script. It works for me (failed
on iterations 6, 12, 2 right now, on my workstation with Ryzen 7900X) --
probably you could adjust number of parallel jobs to reproduce it on your
hardware.
> I have few experience to see the wal_debug output, but background writer seems to
> generate the RUNNING_XACTS record. It's different from my expectation. To confirm,
> did you really enable the injection point? For now 009_twophase can work without
> the `-Dinjection_points=true` but it should be set to avoid random failures.
I think it failed before the injection was set. My log contains:
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT: PREPARE TRANSACTION 'xact_009_10';
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG: xlog flush request 0/030227F8; write
0/00000000; flush 0/00000000
2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT: PREPARE TRANSACTION 'xact_009_10';
2026-02-17 07:06:44.313 EET background writer[754333] LOG: INSERT @ 0/03022838: - Standby/RUNNING_XACTS: nextXid 791
latestCompletedXid 788 oldestRunningXid 789; 1 xacts: 789; 1 subxacts: 790
As far as I can see, it corresponds to this place in the test:
SAVEPOINT s1;
INSERT INTO t_009_tbl VALUES (22, 'issued to ${cur_primary_name}');
PREPARE TRANSACTION 'xact_009_10';");
+$cur_primary->wait_for_replay_catchup($cur_standby);
$cur_primary->teardown_node;
$cur_standby->promote;
And as we found out before, wait_for_replay_catchup() before teardown
doesn't help... I can't say for sure, but from my experiments, the test
didn't fail with $cur_primary->stop instead of $cur_primary->teardown_node.
Best regards,
Alexander
| Attachment | Content-Type | Size |
|---|---|---|
| promote-issue-repro.sh.txt | text/plain | 1.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | David Steele | 2026-02-20 03:10:34 | Re: Return pg_control from pg_backup_stop(). |
| Previous Message | Peter Smith | 2026-02-20 00:28:56 | Re: Skipping schema changes in publication |