Quick Links

Re: Unnecessary delay in streaming replication due to replay lag

From:	sunil s <sunilfeb26(at)gmail(dot)com>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	soumyadeep2007(at)gmail(dot)com, bharath(dot)rupireddyforpostgres(at)gmail(dot)com, daniel(at)yesql(dot)se, michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org, lchch1990(at)sina(dot)cn, masahiko(dot)sawada(at)2ndquadrant(dot)com, hawu(at)pivotal(dot)io, a(dot)lubennikova(at)postgrespro(dot)ru, ashwinstar(at)gmail(dot)com
Subject:	Re: Unnecessary delay in streaming replication due to replay lag
Date:	2025-07-08 18:31:55
Message-ID:	CAOG6S48rsxPkK7wx7wkU0xqJeKO_XS7S+cLiTXpzj0a7VpsC1Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello Hackers,

I recently had the opportunity to continue the effort originally led by a
valued contributor.
I’ve addressed most of the previously reported feedback and issues, and
would like to share the updated patch with the community.

IMHO starting WAL receiver eagerly offers significant advantages because of
following reasons

If recovery_min_apply_delay is set high (for various operational
reasons) and the primary crashes, the mirror can recover quickly, thereby
improving overall High Availability.
2.

For setups without archive-based recovery, restore and recovery
operations complete faster.
3.

When synchronous_commit is enabled, faster mirror recovery reduces
offline time and helps avoid prolonged commit/query wait times during
failover/recovery.
4.

This approach also improves resilience by limiting the impact of network
interruptions on replication.

> In common cases, I believe archive recovery is faster than
replication. If a segment is available from archive, we don't need to
prefetch it via stream.

I completely agree — restoring from the archive is significantly faster
than streaming.
Attempting to stream from the last available WAL in the archive would
introduce complexity and risk.
Therefore, we can limit this feature to crash recovery scenarios and skip
it when archiving is enabled.

> The "FATAL: could not open file" message from walreceiver means that
the walreceiver was operationally prohibited to install a new wal
segment at the time.
This was caused by an additional fix added in upstream to address a race
condition between the archiver and checkpointer.
It has been resolved in the latest patch, which also includes a TAP test to
verify the fix. Thanks for testing and bringing this to our attention.
For now we will skip wal receiver early start since enabling the write
access for wal receiver will reintroduce the bug, which the
commit cc2c7d65fc27e877c9f407587b0b92d46cd6dd16
<https://github.com/postgres/postgres/commit/cc2c7d65fc27e877c9f407587b0b92d46cd6dd16>
fixed
previously.

I've attached the rebased patch with the necessary fix.

Thanks & Regards,
Sunil S (Broadcom)

On Tue, Jul 8, 2025 at 11:01 AM Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
wrote:

> At Wed, 15 Dec 2021 17:01:24 -0800, Soumyadeep Chakraborty <
> soumyadeep2007(at)gmail(dot)com> wrote in
> > Sure, that makes more sense. Fixed.
>
> As I played with this briefly. I started a standby from a backup that
> has an access to archive. I had the following log lines steadily.
>
>
> [139535:postmaster] LOG: database system is ready to accept read-only
> connections
> [139542:walreceiver] LOG: started streaming WAL from primary at 0/2000000
> on timeline 1
> cp: cannot stat '/home/horiguti/data/arc_work/000000010000000000000003':
> No such file or directory
> [139542:walreceiver] FATAL: could not open file
> "pg_wal/000000010000000000000003": No such file or directory
> cp: cannot stat '/home/horiguti/data/arc_work/00000002.history': No such
> file or directory
> cp: cannot stat '/home/horiguti/data/arc_work/000000010000000000000003':
> No such file or directory
> [139548:walreceiver] LOG: started streaming WAL from primary at 0/3000000
> on timeline 1
>
> The "FATAL: could not open file" message from walreceiver means that
> the walreceiver was operationally prohibited to install a new wal
> segment at the time. Thus the walreceiver ended as soon as started.
> In short, the eager replication is not working at all.
>
>
> I have a comment on the behavior and objective of this feature.
>
> In the case where archive recovery is started from a backup, this
> feature lets walreceiver start while the archive recovery is ongoing.
> If walreceiver (or the eager replication) worked as expected, it would
> write wal files while archive recovery writes the same set of WAL
> segments to the same directory. I don't think that is a sane behavior.
> Or, if putting more modestly, an unintended behavior.
>
> In common cases, I believe archive recovery is faster than
> replication. If a segment is available from archive, we don't need to
> prefetch it via stream.
>
> If this feature is intended to use only for crash recovery of a
> standby, it should fire only when it is needed.
>
> If not, that is, if it is intended to work also for archive recovery,
> I think the eager replication should start from the next segment of
> the last WAL in archive but that would invite more complex problems.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>
>
>
>

Attachment	Content-Type	Size
v6-0001-Introduce-feature-to-start-WAL-receiver-eagerly.patch	application/octet-stream	14.8 KB
v6-0002-Test-WAL-receiver-early-start-upon-reaching-consi.patch	application/octet-stream	4.7 KB
v6-0003-Test-archive-recovery-takes-precedence-over-strea.patch	application/octet-stream	4.2 KB

In response to

Re: Unnecessary delay in streaming replication due to replay lag at 2021-12-16 10:05:19 from Kyotaro Horiguchi

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Konstantin Knizhnik	2025-07-08 18:38:01	Re: Logical replication prefetch
Previous Message	Nikita Malakhov	2025-07-08 18:31:29	Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem)