Re: pg_stat_replication.*_lag sometimes shows NULL during active replication

From: Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_stat_replication.*_lag sometimes shows NULL during active replication
Date: 2026-03-06 07:12:48
Message-ID: CAOzEurRGiGE2Dfe+ySpb=+93ku=7ZC6RgAbHtLC6Xsq3g2XexA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 2, 2026 at 11:44 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> With the patch applied, I set up a logical replication and inserted a row every
> second. Even with continuous inserts, NULL was shown in the lag columns of
> pg_stat_replication. That makes me wonder whether the patch's approach is
> sufficient to address the issue.

Thank you for the review and testing! I had only considered the issue
in the context of physical replication, but as you pointed out, my
approach is insufficient for logical replication.

> Relying solely on replies from the standby or subscriber seems a bit fragile to
> me. If the goal is to keep showing the last measured lag for some time,
> perhaps we should introduce a rate limit on when NULL is displayed in the lag
> columns?

My primary goal was to ensure that the source code comments match the
actual behavior, as the comment stating "the second such message must
result from wal_receiver_status_interval expiring on the standby" is
inaccurate. However, as you noted, the patch alone is not sufficient
to fully address the issue.

> For example, if there has been no activity (i.e., sentPtr == applyPtr and
> applyPtr has not changed since the previous cycle) for, say, 10 seconds,
> then we could allow NULL to be shown. Thought?

I considered a time-based rate limit, but it is difficult to choose an
appropriate threshold. Furthermore, the walsender has no way of
knowing the standby's or subscriber's wal_receiver_status_interval
setting.

The attached v2 patch takes a different approach: it additionally
requires that all reported positions (write/flush/apply) remain
unchanged from the previous reply. This directly detects a truly idle
system without relying on timeouts—if any position has advanced, new
WAL activity must have occurred, so we should not clear the lag values
even if the lag tracker is empty.
--
Best regards,
Shinya Kato
NTT OSS Center

Attachment Content-Type Size
v2-0001-Fix-spurious-NULL-lag-in-pg_stat_replication.patch application/octet-stream 3.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2026-03-06 07:14:55 Re: Adding REPACK [concurrently]
Previous Message Fujii Masao 2026-03-06 07:05:17 Re: Improve checks for GUC recovery_target_xid