Re: pg_stat_replication.*_lag sometimes shows NULL during active replication

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_stat_replication.*_lag sometimes shows NULL during active replication
Date: 2026-03-02 14:44:05
Message-ID: CAHGQGwE=kyQ+YnGPn8zpZ959+3ywg8OR_Nu__uXxxuE0E+Y_Zg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 24, 2026 at 3:54 PM Shinya Kato <shinya11(dot)kato(at)gmail(dot)com> wrote:
>
> Hi hackers,
>
> I have noticed that pg_stat_replication.*_lag sometimes shows NULL
> when inserting a record per second for health checking. This happens
> when the startup process replays WAL fast enough before the
> walreceiver sends its flush notification to the walsender.
>
> Here is the sequence that triggers the issue: (See normal.svg and
> error.svg for diagrams of the normal and problematic cases.)
>
> 1. The walreceiver receives, writes, and flushes WAL, then wakes the
> startup process via WakeupRecovery().
>
> 2. The startup process replays all available WAL quickly, then calls
> WalRcvForceReply() to set force_reply = true and wakes the
> walreceiver.
>
> 3. The walreceiver sends a flush notification to the walsender
> (XLogWalRcvSendReply() in XLogWalRcvFlush()). Since the startup has
> already replayed the WAL by this point, this message reports the
> incremented applyPtr, which equals sentPtr. The walsender processes
> this message, consuming the LagTracker samples and setting
> fullyAppliedLastTime = true.
>
> 4. In the next loop iteration, the walreceiver sees force_reply = true
> and sends another reply with the same positions. The walsender sees
> applyPtr == sentPtr for the second consecutive time and sets
> clearLagTimes = true. Since the LagTracker samples were already
> consumed by step 3, all lag values are -1. With clearLagTimes = true,
> these -1 values are written to walsnd->*Lag, causing
> pg_stat_replication to show NULL.
>
> The comment in ProcessStandbyReplyMessage() says:
>
> * If the standby reports that it has fully replayed the WAL in two
> * consecutive reply messages, then the second such message must result
> * from wal_receiver_status_interval expiring on the standby.
>
> But as shown above, the second message can also come from
> WalRcvForceReply(), violating this assumption.
>
> The attached patch fixes this by adding a check that all lag values
> are -1 to the clearLagTimes condition. This ensures that clearLagTimes
> only triggers when there are truly no new lag samples in two
> consecutive messages (i.e., the system is genuinely idle), and not
> when the samples were simply consumed by a preceding message in a
> burst of replies.

Thanks for the patch!

With the patch applied, I set up a logical replication and inserted a row every
second. Even with continuous inserts, NULL was shown in the lag columns of
pg_stat_replication. That makes me wonder whether the patch's approach is
sufficient to address the issue.

Relying solely on replies from the standby or subscriber seems a bit fragile to
me. If the goal is to keep showing the last measured lag for some time,
perhaps we should introduce a rate limit on when NULL is displayed in the lag
columns?

For example, if there has been no activity (i.e., sentPtr == applyPtr and
applyPtr has not changed since the previous cycle) for, say, 10 seconds,
then we could allow NULL to be shown. Thought?

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2026-03-02 14:56:48 Re: Question: rebuilding frontend tools after libpgfeutils.a changes?
Previous Message Antonin Houska 2026-03-02 14:39:01 Re: Adding REPACK [concurrently]