Re: Unnecessary delay in streaming replication due to replay lag

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: "lchch1990(at)sina(dot)cn" <lchch1990(at)sina(dot)cn>
Cc: Asim Praveen <pasim(at)vmware(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Hao Wu (Pivotal)" <hawu(at)pivotal(dot)io>, "ahsan(dot)hadi" <ahsan(dot)hadi(at)highgo(dot)ca>
Subject: Re: Unnecessary delay in streaming replication due to replay lag
Date: 2020-11-20 08:21:06
Message-ID: 20201120082106.GG8506@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 15, 2020 at 05:30:22PM +0800, lchch1990(at)sina(dot)cn wrote:
> I read the code and test the patch, it run well on my side, and I have several issues on the
> patch.

+ RequestXLogStreaming(ThisTimeLineID,
+ startpoint,
+ PrimaryConnInfo,
+ PrimarySlotName,
+ wal_receiver_create_temp_slot);

This patch thinks that it is fine to request streaming even if
PrimaryConnInfo is not set, but that's not fine.

Anyway, I don't quite understand what you are trying to achieve here.
"startpoint" is used to request the beginning of streaming. It is
roughly the consistency LSN + some alpha with some checks on WAL
pages (those WAL page checks are not acceptable as they make
maintenance harder). What about the case where consistency is
reached but there are many segments still ahead that need to be
replayed? Your patch would cause streaming to begin too early, and
a manual copy of segments is not a rare thing as in some environments
a bulk copy of segments can make the catchup of a standby faster than
streaming.

It seems to me that what you are looking for here is some kind of
pre-processing before entering the redo loop to determine the LSN
that could be reused for the fast streaming start, which should match
the end of the WAL present locally. In short, you would need a
XLogReaderState that begins a scan of WAL from the redo point until it
cannot find anything more, and use the last LSN found as a base to
begin requesting streaming. The question of timeline jumps can also
be very tricky, but it could also be possible to not allow this option
if a timeline jump happens while attempting to guess the end of WAL
ahead of time. Another thing: could it be useful to have an extra
mode to begin streaming without waiting for consistency to finish?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Nancarrow 2020-11-20 08:44:37 Re: Parallel INSERT (INTO ... SELECT ...)
Previous Message Kyotaro Horiguchi 2020-11-20 07:29:35 Re: Protect syscache from bloating with negative cache entries