Re: pg_stat_replication lag fields return non-NULL values even with NULL LSNs

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Subject: Re: pg_stat_replication lag fields return non-NULL values even with NULL LSNs
Date: 2019-08-12 23:15:42
Message-ID: CA+hUKGKcuR=25jeQtSOBwJeChxidCtC8y53jaMEUASLzhOyTrw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 17, 2019 at 1:52 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> I got surprised by the following behavior from pg_stat_get_wal_senders
> when connecting for example pg_receivewal to a primary:
> =# select application_name, flush_lsn, replay_lsn, flush_lag,
> replay_lag from pg_stat_replication;
> application_name | flush_lsn | replay_lsn | flush_lag | replay_lag
> ------------------+-----------+------------+-----------------+-----------------
> receivewal | null | null | 00:09:13.578185 | 00:09:13.578185
> (1 row)
>
> It makes little sense to me, as we are reporting a replay lag on a
> position which has never been reported yet, so it cannot actually be
> used as a comparison base for the lag. Am I missing something or
> should we return NULL for those fields if we have no write, flush or
> apply LSNs like in the attached?

Hmm. It's working as designed, but indeed it's not very newsworthy
information in this case. If you run pg_receivewal --synchronous then
you get sensible looking flush_lag times. Without that, flush_lag
only goes up, and of course replay_lag only goes up, so although it's
telling the truth, I think your proposal makes sense.

One question I had is what would happen with your patch without
--synchronous, once it flushes a whole file and opens a new one; I
wondered if your new boring-information-hiding behaviour would stop
working after one segment file because of that. I tested that and the
column remains NULL when we move to a new file, so that's good.

One thing I noticed in passing is that you always get the same times
in the write_lag and flush_lag columns, in --synchronous mode, and the
times updates infrequently. That's not the case with regular
replicas; I suspect there is a difference in the time and frequency of
replies sent to the server, which I guess might make synchronous
commit a bit "lumpier", but I didn't dig further today.

--
Thomas Munro
https://enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-08-12 23:58:03 Re: SegFault on 9.6.14
Previous Message Peter Geoghegan 2019-08-12 22:36:23 Re: Do not check unlogged indexes on standby