Re: Unnecessary delay in streaming replication due to replay lag

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: Michael Paquier <michael(at)paquier(dot)xyz>, "lchch1990(at)sina(dot)cn" <lchch1990(at)sina(dot)cn>
Cc: Asim Praveen <pasim(at)vmware(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Hao Wu (Pivotal)" <hawu(at)pivotal(dot)io>, "ahsan(dot)hadi" <ahsan(dot)hadi(at)highgo(dot)ca>
Subject: Re: Unnecessary delay in streaming replication due to replay lag
Date: 2020-12-01 14:21:51
Message-ID: 512fe29e-d772-baaa-1874-30e22fe68706@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20.11.2020 11:21, Michael Paquier wrote:
> On Tue, Sep 15, 2020 at 05:30:22PM +0800, lchch1990(at)sina(dot)cn wrote:
>> I read the code and test the patch, it run well on my side, and I have several issues on the
>> patch.
> + RequestXLogStreaming(ThisTimeLineID,
> + startpoint,
> + PrimaryConnInfo,
> + PrimarySlotName,
> + wal_receiver_create_temp_slot);
>
> This patch thinks that it is fine to request streaming even if
> PrimaryConnInfo is not set, but that's not fine.
>
> Anyway, I don't quite understand what you are trying to achieve here.
> "startpoint" is used to request the beginning of streaming. It is
> roughly the consistency LSN + some alpha with some checks on WAL
> pages (those WAL page checks are not acceptable as they make
> maintenance harder). What about the case where consistency is
> reached but there are many segments still ahead that need to be
> replayed? Your patch would cause streaming to begin too early, and
> a manual copy of segments is not a rare thing as in some environments
> a bulk copy of segments can make the catchup of a standby faster than
> streaming.
>
> It seems to me that what you are looking for here is some kind of
> pre-processing before entering the redo loop to determine the LSN
> that could be reused for the fast streaming start, which should match
> the end of the WAL present locally. In short, you would need a
> XLogReaderState that begins a scan of WAL from the redo point until it
> cannot find anything more, and use the last LSN found as a base to
> begin requesting streaming. The question of timeline jumps can also
> be very tricky, but it could also be possible to not allow this option
> if a timeline jump happens while attempting to guess the end of WAL
> ahead of time. Another thing: could it be useful to have an extra
> mode to begin streaming without waiting for consistency to finish?
> --
> Michael

Status update for a commitfest entry.

This entry was "Waiting On Author" during this CF, so I've marked it as
returned with feedback. Feel free to resubmit an updated version to a
future commitfest.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anastasia Lubennikova 2020-12-01 14:41:17 Re: Corner-case bug in pg_rewind
Previous Message Alvaro Herrera 2020-12-01 14:07:30 Re: error_severity of brin work item