| From: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
|---|---|
| To: | Michael Paquier <michael(at)paquier(dot)xyz> |
| Cc: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline |
| Date: | 2026-06-12 00:57:05 |
| Message-ID: | CABPTF7WSpNOYu84fjGH2t56BctRzVD7t8WqhgvML2DRh8Vtfog@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Michael,
On Thu, Jun 11, 2026 at 9:15 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Wed, Jun 10, 2026 at 05:28:00PM +0000, Bertrand Drouvot wrote:
> > On Wed, Jun 10, 2026 at 04:36:14PM +0800, Xuneng Zhou wrote:
> >> The
> >> essential thing is just to ensure that the startup remains paused
> >> until decoding output is observed.
> >
> > Right, thanks for confirming. That's exactly what v2 is doing.
>
> I have looked at this thread, and my first impression was that this
> could be a data integrity issue while decoding changes due to the
> transient errors one could see across the promotion requests.
>
> But it's less severe than I thought initially: we have an availability
> problem here, down to v16, with a correct recovery possible once the
> promotion request has completed. That could be indeed surprising for
> users that have HA setups with standbys doing logical decoding.. The
> SQL function path is less worrying to me, there are as far as I know
> few users of it compared to the "native" path with sync workers.
>
> read_local_xlog_page_guts() does not only impact SQL-callable logirep
> functions, even it is the spot that should be hit most of the time
> (again, the RecoveryInProgress() vs promotion window is super narrow).
> At quick glance, things are:
> - walinspect.
> - Slot advance.
> - Slot creation (?), but it feels even narrower.
Yeah, it is used for two-phase commit as well. The usage of it is
broader than I observed before. Repack worker also make use of it.
> With two items dealt with on this thread for these two callback paths
> changed, moving on the part related to physical replication into its
> own thread would be better. This requires an entirely different
> analysis and a different lookup.
+1
> The backpatch of PG16 is straight-forward and adding
> GetWALInsertionTimeLineIfSet() down there does not look like an issue.
> Not having any tests in v16 feels sad, but that's life. It does not
> prevent addressing the availability issue on this branch.
>
> I'll go take it up from here.
> --
Thanks for dealing with this!
--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Josh Curtis | 2026-06-12 01:04:38 | Re: Fix race condition in SSI when reading PredXact->SxactGlobalXmin |
| Previous Message | Chao Li | 2026-06-12 00:51:24 | Re: amcheck: fix bug of missing corruption in allequalimage validation |