| From: | surya poondla <suryapoondla4(at)gmail(dot)com> |
|---|---|
| To: | cca5507 <cca5507(at)qq(dot)com> |
| Cc: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: [BUG] Take a long time to reach consistent after pg_rewind |
| Date: | 2026-06-29 18:53:27 |
| Message-ID: | CAOVWO5pj-BOAtSCkuGLCA3HLdFJJ_3hawZ9JLvF2BckRj+15rQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi ChangAo,
Thanks for the v3, the commit message, in-line comment, and the
rewind_source.h note all look good
On the test front: I don't think a hang-detection test can be made
reliable. The bug requires the source's insert LSN to be exactly
segment_boundary + SizeOfXLogLongPHD with no further WAL activity, but
bgwriter's periodic LogStandbySnapshot emits a RUNNING_XACTS which can
advance the insert LSN
nondeterministically between pg_switch_wal() and the rewind. In my
reproduction bgwriter ended the hang after ~9s; that's the kind of timing
we don't want in CI.
The deterministic alternative is to parse pg_controldata on the target
after pg_rewind and assert minRecoveryPoint does not land
at "boundary + SizeOfXLogLongPHD". That's a direct check on the patched
behavior independent of source idleness or replay
timing. It doesn't exercise the integration property that the rewound node
reaches consistency without further upstream WAL.
So I am not sure if this testcase is a complete one in our scenario.
Regards,
Surya Poondla
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Kirill Reshke | 2026-06-29 19:58:15 | Re: PostgreSQL select-only CTE removal is too aggressive? |
| Previous Message | Robert Haas | 2026-06-29 18:17:02 | Re: use of SPI by postgresImportForeignStatistics |