Re: Measuring replay lag

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Measuring replay lag
Date: 2017-01-03 12:06:37
Message-ID: CANP8+jLuWr=q46k=OG6pdkjdpLjBoBecVqjXdD12aEwd8m3x1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21 December 2016 at 21:14, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Thu, Dec 22, 2016 at 2:14 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> I agree that the capability to measure the remote_apply lag is very useful.
>> Also I want to measure the remote_write and remote_flush lags, for example,
>> in order to diagnose the cause of replication lag.
>
> Good idea. I will think about how to make that work. There was a
> proposal to make writing and flushing independent[1]. I'd like that
> to go in. Then the write_lag and flush_lag could diverge
> significantly, and it would be nice to be able to see that effect as
> time (though you could already see it with LSN positions).

I think it has a much better chance now that the replies from apply
are OK. Will check in this release, but not now.

>> For that, what about maintaining the pairs of send-timestamp and LSN in
>> *sender side* instead of receiver side? That is, walsender adds the pairs
>> of send-timestamp and LSN into the buffer every sampling period.
>> Whenever walsender receives the write, flush and apply locations from
>> walreceiver, it calculates the write, flush and apply lags by comparing
>> the received and stored LSN and comparing the current timestamp and
>> stored send-timestamp.
>
> I thought about that too, but I couldn't figure out how to make the
> sampling work. If the primary is choosing (LSN, time) pairs to store
> in a buffer, and the standby is sending replies at times of its
> choosing (when wal_receiver_status_interval has been exceeded), then
> you can't accurately measure anything.

Skipping adding the line delay to this was very specifically excluded
by Tom, so that clock disparity between servers is not included.

If the balance of opinion is in favour of including a measure of
complete roundtrip time then I'm OK with that.

> You could fix that by making the standby send a reply *every time* it
> applies some WAL (like it does for transactions committing with
> synchronous_commit = remote_apply, though that is only for commit
> records), but then we'd be generating a lot of recovery->walreceiver
> communication and standby->primary network traffic, even for people
> who don't otherwise need it. It seems unacceptable.

I don't see why that would be unacceptable. If we do it for
remote_apply, why not also do it for other modes? Whatever the
reasoning was for remote_apply should work for other modes. I should
add it was originally designed to be that way by me, so must have been
changed later.

This seems like a bug to me now that I look harder. The docs for
wal_receiver_status_interval say "Updates are sent each time the
write or flush positions change, or at least as often as specified by
this parameter." But it doesn't do that, as I think it should.

> Or you could fix that by setting the XACT_COMPLETION_APPLY_FEEDBACK
> bit in the xl_xinfo.xinfo for selected transactions, as a way to ask
> the standby to send a reply when that commit record is applied, but
> that only works for commit records. One of my goals was to be able to
> report lag accurately even between commits (very large data load
> transactions etc).

As we said, we do have keepalive records we could use for that.

> Or you could fix that by sending a list of 'interesting LSNs' to the
> standby, as a way to ask it to send a reply when those LSNs are
> applied. Then you'd need a circular buffer of (LSN, time) pairs in
> the primary AND a circular buffer of LSNs in the standby to remember
> which locations should generate a reply. This doesn't seem to be an
> improvement.
>
> That's why I thought that the standby should have the (LSN, time)
> buffer: it decides which samples to record in its buffer, using LSN
> and time provided by the sending server, and then it can send replies
> at exactly the right times. The LSNs don't have to be commit records,
> they're just arbitrary points in the WAL stream which we attach
> timestamps to. IPC and network overhead is minimised, and accuracy is
> maximised.

I'm dubious of keeping standby-side state, but I will review the patch.

>> As a bonus of this approach, we don't need to add the field into the replay
>> message that walreceiver can very frequently send back. Which might be
>> helpful in terms of networking overhead.
>
> For the record, these replies are only sent approximately every
> replay_lag_sample_interval (with variation depending on replay speed)
> and are only 42 bytes with the new field added.
>
> [1] https://www.postgresql.org/message-id/CA%2BU5nMJifauXvVbx%3Dv3UbYbHO3Jw2rdT4haL6CCooEDM5%3D4ASQ%40mail.gmail.com

We have time to make any changes to allow this to be applied in this release.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-01-03 12:08:43 Re: Potential data loss of 2PC files
Previous Message Fabien COELHO 2017-01-03 12:03:35 Re: proposal: session server side variables