Re: Measuring replay lag

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Measuring replay lag
Date: 2016-12-21 21:14:23
Message-ID: CAEepm=07jWJ6=rzaaBivpPy_aUfapv-xPdc3+0HKHEBws6K4jw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 22, 2016 at 2:14 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> I agree that the capability to measure the remote_apply lag is very useful.
> Also I want to measure the remote_write and remote_flush lags, for example,
> in order to diagnose the cause of replication lag.

Good idea. I will think about how to make that work. There was a
proposal to make writing and flushing independent[1]. I'd like that
to go in. Then the write_lag and flush_lag could diverge
significantly, and it would be nice to be able to see that effect as
time (though you could already see it with LSN positions).

> For that, what about maintaining the pairs of send-timestamp and LSN in
> *sender side* instead of receiver side? That is, walsender adds the pairs
> of send-timestamp and LSN into the buffer every sampling period.
> Whenever walsender receives the write, flush and apply locations from
> walreceiver, it calculates the write, flush and apply lags by comparing
> the received and stored LSN and comparing the current timestamp and
> stored send-timestamp.

I thought about that too, but I couldn't figure out how to make the
sampling work. If the primary is choosing (LSN, time) pairs to store
in a buffer, and the standby is sending replies at times of its
choosing (when wal_receiver_status_interval has been exceeded), then
you can't accurately measure anything.

You could fix that by making the standby send a reply *every time* it
applies some WAL (like it does for transactions committing with
synchronous_commit = remote_apply, though that is only for commit
records), but then we'd be generating a lot of recovery->walreceiver
communication and standby->primary network traffic, even for people
who don't otherwise need it. It seems unacceptable.

Or you could fix that by setting the XACT_COMPLETION_APPLY_FEEDBACK
bit in the xl_xinfo.xinfo for selected transactions, as a way to ask
the standby to send a reply when that commit record is applied, but
that only works for commit records. One of my goals was to be able to
report lag accurately even between commits (very large data load
transactions etc).

Or you could fix that by sending a list of 'interesting LSNs' to the
standby, as a way to ask it to send a reply when those LSNs are
applied. Then you'd need a circular buffer of (LSN, time) pairs in
the primary AND a circular buffer of LSNs in the standby to remember
which locations should generate a reply. This doesn't seem to be an
improvement.

That's why I thought that the standby should have the (LSN, time)
buffer: it decides which samples to record in its buffer, using LSN
and time provided by the sending server, and then it can send replies
at exactly the right times. The LSNs don't have to be commit records,
they're just arbitrary points in the WAL stream which we attach
timestamps to. IPC and network overhead is minimised, and accuracy is
maximised.

> As a bonus of this approach, we don't need to add the field into the replay
> message that walreceiver can very frequently send back. Which might be
> helpful in terms of networking overhead.

For the record, these replies are only sent approximately every
replay_lag_sample_interval (with variation depending on replay speed)
and are only 42 bytes with the new field added.

[1] https://www.postgresql.org/message-id/CA%2BU5nMJifauXvVbx%3Dv3UbYbHO3Jw2rdT4haL6CCooEDM5%3D4ASQ%40mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-12-21 21:28:54 Re: Fix checkpoint skip logic on idle systems by tracking LSN progress
Previous Message Tom Lane 2016-12-21 20:26:52 Re: Minor correction in alter_table.sgml