| From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
|---|---|
| To: | 'Alexander Lakhin' <exclusion(at)gmail(dot)com> |
| Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Aleksander Alekseev <aleksander(at)timescale(dot)com> |
| Subject: | RE: BUG: Former primary node might stuck when started as a standby |
| Date: | 2026-02-19 03:50:02 |
| Message-ID: | OS9PR01MB12149D4F1A2BC23637688CE4DF56BA@OS9PR01MB12149.jpnprd01.prod.outlook.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Dear Alexander,
> Unfortunately, the testing procedure I shared above still produces failures
> with the patched 009_twophase.pl.
Hmm, I ran the test for hours, but I could nor reproduce the failure. But let me analyze
based on your log.
> With wal_debug = on
> in /tmp/temp.config, I can see:
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG: statement: PREPARE TRANSACTION 'xact_009_10';
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG: INSERT @ 0/030227F8: - Transaction/PREPARE: gid xact_009_10: 2026-02-17 07:06:44.313023+02; subxacts: 790
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT: PREPARE TRANSACTION 'xact_009_10';
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG: xlog flush request 0/030227F8; write 0/00000000; flush 0/00000000
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT: PREPARE TRANSACTION 'xact_009_10';
> 2026-02-17 07:06:44.313 EET background writer[754333] LOG: INSERT @ 0/03022838: - Standby/RUNNING_XACTS: nextXid 791 latestCompletedXid 788 oldestRunningXid 789; 1 xacts: 789; 1 subxacts: 790
> 2026-02-17 07:06:44.322 EET client backend[754949] 009_twophase.pl LOG: statement: SELECT pg_current_wal_flush_lsn()
> 2026-02-17 07:06:44.330 EET client backend[754974] 009_twophase.pl LOG: statement: SELECT '0/030227F8' <= replay_lsn AND state = 'streaming'
> FROM pg_catalog.pg_stat_replication
> WHERE application_name IN ('paris', 'walreceiver')
> 2026-02-17 07:06:44.337 EET postmaster[754306] LOG: received immediate shutdown request
> 2026-02-17 07:06:44.339 EET postmaster[754306] LOG: database system is shut down
I have few experience to see the wal_debug output, but background writer seems to
generate the RUNNING_XACTS record. It's different from my expectation. To confirm,
did you really enable the injection point? For now 009_twophase can work without
the `-Dinjection_points=true` but it should be set to avoid random failures.
Another possibility is that bgwriter has already been passed `IS_INJECTION_POINT_ATTACHED("skip-log-running-xacts")`
before attaching the injection point. In this case background writer would
generate the RUNNING_XACTS records at the bad timing. One possible workaround
is to attach the injeciton_point bit earlier, it does not solve the root cause
though. Attached patch applies the band-aid.
> Regarding the approach with "skip-log-running-xacts", I wonder if there is
> a strong guarantee that no other WAL write can happen in the same window
> (e.g., from autovacuum)?
Good point, autovacuum and checkpoint should be also avoided. IIUC it can be done
by setting GUCs.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
| Attachment | Content-Type | Size |
|---|---|---|
| v2-0001-Stabilize-009_twophase.pl.patch | application/octet-stream | 5.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2026-02-19 03:58:12 | Re: Flush some statistics within running transactions |
| Previous Message | Andreas Karlsson | 2026-02-19 03:49:13 | Re: DOCS - Add an "Options" section for pg_controldata |