Re: Unnecessary delay in streaming replication due to replay lag

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: soumyadeep2007(at)gmail(dot)com
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, daniel(at)yesql(dot)se, michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org, lchch1990(at)sina(dot)cn, masahiko(dot)sawada(at)2ndquadrant(dot)com, hawu(at)pivotal(dot)io, a(dot)lubennikova(at)postgrespro(dot)ru, ashwinstar(at)gmail(dot)com
Subject: Re: Unnecessary delay in streaming replication due to replay lag
Date: 2021-12-16 10:05:19
Message-ID: 20211216.190519.572575190462409312.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 15 Dec 2021 17:01:24 -0800, Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com> wrote in
> Sure, that makes more sense. Fixed.

As I played with this briefly. I started a standby from a backup that
has an access to archive. I had the following log lines steadily.

[139535:postmaster] LOG: database system is ready to accept read-only connections
[139542:walreceiver] LOG: started streaming WAL from primary at 0/2000000 on timeline 1
cp: cannot stat '/home/horiguti/data/arc_work/000000010000000000000003': No such file or directory
[139542:walreceiver] FATAL: could not open file "pg_wal/000000010000000000000003": No such file or directory
cp: cannot stat '/home/horiguti/data/arc_work/00000002.history': No such file or directory
cp: cannot stat '/home/horiguti/data/arc_work/000000010000000000000003': No such file or directory
[139548:walreceiver] LOG: started streaming WAL from primary at 0/3000000 on timeline 1

The "FATAL: could not open file" message from walreceiver means that
the walreceiver was operationally prohibited to install a new wal
segment at the time. Thus the walreceiver ended as soon as started.
In short, the eager replication is not working at all.

I have a comment on the behavior and objective of this feature.

In the case where archive recovery is started from a backup, this
feature lets walreceiver start while the archive recovery is ongoing.
If walreceiver (or the eager replication) worked as expected, it would
write wal files while archive recovery writes the same set of WAL
segments to the same directory. I don't think that is a sane behavior.
Or, if putting more modestly, an unintended behavior.

In common cases, I believe archive recovery is faster than
replication. If a segment is available from archive, we don't need to
prefetch it via stream.

If this feature is intended to use only for crash recovery of a
standby, it should fire only when it is needed.

If not, that is, if it is intended to work also for archive recovery,
I think the eager replication should start from the next segment of
the last WAL in archive but that would invite more complex problems.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shay Rojansky 2021-12-16 10:38:09 Re: Privilege required for IF EXISTS event if the object already exists
Previous Message Peter Eisentraut 2021-12-16 09:56:51 pg_dump: Refactor getIndexes()