Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
Date: 2026-06-09 07:49:50
Message-ID: CABPTF7VovNWnH=dG_A=e6xkMYt11ubbj935Ymfky55cXcWcrrA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 8, 2026 at 10:34 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> On Mon, Jun 8, 2026 at 10:22 PM Bertrand Drouvot
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > On Mon, Jun 08, 2026 at 09:00:17PM +0800, Xuneng Zhou wrote:
> > > I tweaked the reproducer based on the theory outlined above. The main
> > > changes from the original reproducer are:
> > >
> > > 1) blocks at logical-walsender-after-slot-acquire in walsender.c,
> > > before the decoding context is created and before the reader starts
> > > from restart_lsn, matching the delay set by Alexander
> >
> > > 2) Forces the first read to occur during promotion. It inserts rows
> > > 1..4, waits for replay, starts promotion with pg_promote(false), holds
> > > startup at startup-logical-decoding-status-change-end-of-recovery,
> > > then wakes the walsender.
> >
> > Yeah, so this existing startup-logical-decoding-status-change-end-of-recovery
> > injection point already exists in the code tree and is also called after
> > CleanupAfterArchiveRecovery() and before RECOVERY_STATE_DONE change in
> > StartupXLOG().
> >
> > So this is the same window as the new injection point that was added in the new
> > tests in v1-0002 shared up-thread [1].
> >
> > That said, I think I prefer the v1-0002 shared up-thread [1] approach for the
> > tests as:
> >
> > - the injection point name clearly describes the tested condition
> > - no new injection point is added in walsender.c (it pauses startup mid-promotion
> > and lets the walsender connect)
> > - the test relies on one injection point (not two)
> >
> > [1]: https://postgr.es/m/aiaBtENl7tTf2MM8%40bdtpg
> >
>
> Thanks for the clarification. I haven't reviewed the patch set other
> than applying the patch 1 yet, but I plan to do so tomorrow.

I've readed through the patch set. They look good overall. Here're
some comments on them:

1) In the commit messages and comments for all four patches, the
reason why the target WAL segment cannot be found on the old timeline
is described as follows:

"old timeline WAL segments have already been removed or
recycled by RemoveNonParentXlogFiles() in CleanupAfterArchiveRecovery()."

Is mentioning the 'remove' case only a bit narrow?

The timeline-selection comment says this explicitly:
"there's no guarantee the old segment will still exist. It may have been
deleted or renamed with a .partial suffix"

How about phrasing it like:
old timeline WAL files may have been removed, recycled, or renamed to .partial.

After running the reproducer provided by Hayato-san, the standby’s
pg_wal directory looked like this following the failure:
000000010000000000000003.partial
00000002.history
000000020000000000000003
000000020000000000000004

So in this repro, the requested file:

000000010000000000000003

was not unlinked as a regular "removed" file. It had been renamed to:

000000010000000000000003.partial

but the log says this explicitly:
ERROR: requested WAL segment 000000010000000000000003 has already been removed

It appears inconsistent to me...

2) Injection points in tests 0002 and 0004

> > - the injection point name clearly describes the tested condition

I think this is true. It better reflects the intention than
startup-logical-decoding-status-change-end-of-recovery does. However,
I don't know whether this alone warrants a new injection point.

> > - no new injection point is added in walsender.c (it pauses startup mid-promotion
> > and lets the walsender connect)
> > - the test relies on one injection point (not two)

I am unsure that the injection point could replace the walsender-side
synchronization need.

The race has two moving processes:

- startup process: promotion cleanup / RECOVERY_STATE_DONE transition
- walsender process: logical_read_xlog_page() timeline choice

A startup injection point controls only this:
startup is paused in the vulnerable promotion window

It does not prove this:
walsender has reached logical_read_xlog_page() while startup is paused

3) Stricter synchronization point in both tests
Both tests use this condition "active_pid IS NOT NULL" for
synchronization at the walsender side. However, it only proves that
pg_recvlogical has connected walsender has acquired the logical slot,
not necessarily the walsender is paused after acquiring the slot and
before the promotion window is set. There are several potential states
for walsender in this condition:

walsender is just after ReplicationSlotAcquire()
it has called XLogBeginRead()
it is already inside logical_read_xlog_page()
it already opened the WAL segment
it already failed or succeeded

The test cannot distinguish those states reliably.

So we may still need another injection point for synchronization at
the walsender side

4) Stricter result checks in test files
The surrounding 035 test is stricter than the test in 0002. It first
waits for COMMITs, then compares exact decoded output. Should we
adhere to this pattern too?

0004's \d+ check seems somewhat loose to me. Should we capture the
query result and compare it to the expected count, or select data and
compare the decoded rows?

--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2026-06-09 07:59:38 Re: Why is the LSN reported for pg_logical_emit_message() different from other decoded operations?
Previous Message shaobo zhang 2026-06-09 07:46:42 Re: Fix missing semicolon in pl_gram.y for option_value rule