Re: Measuring replay lag

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Measuring replay lag
Date: 2017-03-16 00:02:56
Message-ID: CAEepm=33y_FhfBq3WyDpLDCdeQt00fGhFSrpgosVb9gjy0=N9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 16, 2017 at 12:07 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> There are two ways of knowing the lag: 1) by measurement/sampling,
> which is the main way this patch approaches this, 2) by direct
> observation the LSNs match. Both are equally valid ways of
> establishing knowledge. Strangely (2) is the only one of those that is
> actually precise and yet you say it is bogus. It is actually the
> measurements which are approximations of the actual state.
>
> The reality is that the lag can change dis-continuously between zero
> and non-zero. I don't think we should hide that from people.
>
> I suspect that your "entirely bogus" feeling comes from the point that
> we actually have 3 states, one of which has unknown lag.
>
> A) "Currently caught-up"
> WALSender LSN == WALReceiver LSN (info type (1))
> At this point the current lag is known precisely to be zero.
>
> B) "Work outstanding, no reply yet"
> Immediately after where WALSenderLSN > WALReceiverLSN, yet we haven't
> yet received new reply
> We expect to stay in this state for however long it takes to receive a
> reply, which could be wal_receiver_status_interval or longer if the
> lag is greater. At this point we have no measurement of what the lag
> is. We could reply NULL since we don't know. We could reply with the
> last measured lag when we were last in state C, but if the new reply
> was delayed for more than that we'd need to reply that the lag is at
> least as high as the delay since last time we left state A.
>
> C) "Continuous flow"
> WALSenderLSN > WALReceiverLSN and we have received a reply
> (measurement, info type (2))
> This is the main case. Easy-ish!
>
> So I think we need to first agree that A and B states exist and how to
> report lag in each state.

I agree that these states exist, but we disagree on what 'lag' really
means, or, rather, which of several plausible definitions would be the
most useful here.

My proposal is that the *_lag columns should always report how long it
took for recently written, flushed and applied WAL to be written,
flushed and applied (and for the primary to know about it). By this
definition, sent LSN = applied LSN is not a special case: we simply
report how long that LSN took to be written, flushed and applied.

Your proposal is that the *_lag columns should report how far in the
past the standby is at each of the three stages with respect to the
current end of WAL. By this definition when sent LSN = applied LSN we
are currently in the 'A' state meaning 'caught up' and should show
00:00:00.

Here are two reasons I prefer my definition:

* you can trivially convert from my definition to yours on the basis
of existing information: CASE WHEN sent_location = replay_location
THEN '00:00:00'::interval ELSE replay_lag END, but there is no way to
get from your definition to mine

* lag numbers reported using my definition tell you how long each of
the synchronous replication levels take, but with your definition they
only do that if you catch them during times when they aren't showing
the special case 00:00:00; a fast standby running any workload other
than a benchmark is often going to show all-caught-up 00:00:00 so the
new columns will be useless for that purpose

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-03-16 00:09:03 Re: WIP: Faster Expression Processing v4
Previous Message Dilip Kumar 2017-03-15 23:41:17 Re: Parallel Bitmap scans a bit broken