Re: Measuring replay lag

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Measuring replay lag
Date: 2017-03-22 10:57:08
Message-ID: CANP8+jL4148BXELpGqTXO9VXTq0XdjhY1yCOForPmWL8q20JJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21 March 2017 at 17:32, David Steele <david(at)pgmasters(dot)net> wrote:
> Hi Thomas,
>
> On 3/15/17 8:38 PM, Simon Riggs wrote:
>>
>> On 16 March 2017 at 08:02, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
>> wrote:
>>
>>> I agree that these states exist, but we disagree on what 'lag' really
>>> means, or, rather, which of several plausible definitions would be the
>>> most useful here.
>>>
>>> My proposal is that the *_lag columns should always report how long it
>>> took for recently written, flushed and applied WAL to be written,
>>> flushed and applied (and for the primary to know about it). By this
>>> definition, sent LSN = applied LSN is not a special case: we simply
>>> report how long that LSN took to be written, flushed and applied.
>>>
>>> Your proposal is that the *_lag columns should report how far in the
>>> past the standby is at each of the three stages with respect to the
>>> current end of WAL. By this definition when sent LSN = applied LSN we
>>> are currently in the 'A' state meaning 'caught up' and should show
>>> 00:00:00.
>>
>>
>> I accept your proposal for how we handle these, on condition that you
>> write up some docs that explain the subtle difference between the two,
>> so we can just show people the URL. That needs to explain clearly the
>> difference in an impartial way between "what is the most recent lag
>> measurement" and "how long until we are caught up" as possible
>> intrepretations of these values. Thanks.
>
>
> This thread has been idle for six days. Please respond and/or post a new
> patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be marked
> "Returned with Feedback".

Thomas, Are you working on another version even? You've not replied to
me or David, so its difficult to know what next.

Not sure whether this a 6 day lag, or we should show NULL because we
are up to date.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-03-22 11:03:14 Re: Measuring replay lag
Previous Message Heikki Linnakangas 2017-03-22 10:48:48 Re: [COMMITTERS] pgsql: Fix and simplify check for whether we're running as Windows serv