Re: Measuring replay lag

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Measuring replay lag
Date: 2016-12-21 13:14:42
Message-ID: CAHGQGwGANKWsH4jETZpucK7K0FZ8P70=9NEwgOJHPUzGxN0Z9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 19, 2016 at 8:13 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Mon, Dec 19, 2016 at 4:03 PM, Peter Eisentraut
> <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
>> On 11/22/16 4:27 AM, Thomas Munro wrote:
>>> Thanks very much for testing! New version attached. I will add this
>>> to the next CF.
>>
>> I don't see it there yet.
>
> Thanks for the reminder. Added here: https://commitfest.postgresql.org/12/920/
>
> Here's a rebased patch.

I agree that the capability to measure the remote_apply lag is very useful.
Also I want to measure the remote_write and remote_flush lags, for example,
in order to diagnose the cause of replication lag.

For that, what about maintaining the pairs of send-timestamp and LSN in
*sender side* instead of receiver side? That is, walsender adds the pairs
of send-timestamp and LSN into the buffer every sampling period.
Whenever walsender receives the write, flush and apply locations from
walreceiver, it calculates the write, flush and apply lags by comparing
the received and stored LSN and comparing the current timestamp and
stored send-timestamp.

As a bonus of this approach, we don't need to add the field into the replay
message that walreceiver can very frequently send back. Which might be
helpful in terms of networking overhead.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-12-21 13:22:51 Re: Logical tape pause/resume
Previous Message Ants Aasma 2016-12-21 13:03:57 Re: Replication slot xmin is not reset if HS feedback is turned off while standby is shut down