Re: pg_stat_replication lag fields return non-NULL values even with NULL LSNs

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Subject: Re: pg_stat_replication lag fields return non-NULL values even with NULL LSNs
Date: 2019-08-13 03:04:00
Message-ID: CA+hUKGKaB12rqG_WNpTwOB_-==v2Gs9ahbsVSsHOrMvmNh1Y4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 13, 2019 at 2:20 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Tue, Aug 13, 2019 at 11:15:42AM +1200, Thomas Munro wrote:
> > One thing I noticed in passing is that you always get the same times
> > in the write_lag and flush_lag columns, in --synchronous mode, and the
> > times updates infrequently. That's not the case with regular
> > replicas; I suspect there is a difference in the time and frequency of
> > replies sent to the server, which I guess might make synchronous
> > commit a bit "lumpier", but I didn't dig further today.
>
> The messages are sent by pg_receivewal via sendFeedback() in
> receivelog.c. It gets triggered for the --synchronous case once a
> flush is done (but you are not surprised by my reply here, right!),
> and most likely the matches you are seeing some from the messages sent
> at the beginning of HandleCopyStream() where the flush and write
> LSNs are equal. This code behaves as I would expect based on your
> description and a read of the code I have just done to refresh my
> mind, but we may of course have some issues or potential
> improvements.

Right. For a replica server we call XLogWalRcvSendReply() after
writing, and then again inside XLogWalRcvFlush(). So the primary gets
to measure write_lag and flush_lag separately. If pg_receivewal just
sends one reply after flushing, then turning on --synchronous has the
effect of showing the flush lag in both write_lag and flush_lag
columns.

Of course those things aren't quite as independent as they should be
anyway, since the flush is blocking and therefore delays the next
write. <mind-reading-mode>That's why Simon probably wants to move the
flush to the WAL writer process, and Andres probably wants to change
the whole thing to use some kind of async IO[1].</mind-reading-mode>

[1] https://lwn.net/Articles/789024/

--
Thomas Munro
https://enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-08-13 03:15:26 Re: Regression test failure in regression test temp.sql
Previous Message Michael Paquier 2019-08-13 02:56:34 Re: Add "password_protocol" connection parameter to libpq