Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Xuneng Zhou <xunengzhou(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
Date: 2026-06-11 01:15:01
Message-ID: aioMFYcZhLB6urla@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 10, 2026 at 05:28:00PM +0000, Bertrand Drouvot wrote:
> On Wed, Jun 10, 2026 at 04:36:14PM +0800, Xuneng Zhou wrote:
>> The
>> essential thing is just to ensure that the startup remains paused
>> until decoding output is observed.
>
> Right, thanks for confirming. That's exactly what v2 is doing.

I have looked at this thread, and my first impression was that this
could be a data integrity issue while decoding changes due to the
transient errors one could see across the promotion requests.

But it's less severe than I thought initially: we have an availability
problem here, down to v16, with a correct recovery possible once the
promotion request has completed. That could be indeed surprising for
users that have HA setups with standbys doing logical decoding.. The
SQL function path is less worrying to me, there are as far as I know
few users of it compared to the "native" path with sync workers.

read_local_xlog_page_guts() does not only impact SQL-callable logirep
functions, even it is the spot that should be hit most of the time
(again, the RecoveryInProgress() vs promotion window is super narrow).
At quick glance, things are:
- walinspect.
- Slot advance.
- Slot creation (?), but it feels even narrower.

With two items dealt with on this thread for these two callback paths
changed, moving on the part related to physical replication into its
own thread would be better. This requires an entirely different
analysis and a different lookup.

The backpatch of PG16 is straight-forward and adding
GetWALInsertionTimeLineIfSet() down there does not look like an issue.
Not having any tests in v16 feels sad, but that's life. It does not
prevent addressing the availability issue on this branch.

I'll go take it up from here.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Henson Choi 2026-06-11 01:20:45 Re: Row pattern recognition
Previous Message Peter Smith 2026-06-11 00:57:50 Re: DOCS - Add missing EXCEPT parameter description to ALTER PUBLICATION