Re: pg_stat_replication.*_lag sometimes shows NULL during active replication

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_stat_replication.*_lag sometimes shows NULL during active replication
Date: 2026-03-09 11:21:03
Message-ID: CAHGQGwH2h_R7FWPvEs3+NWLwHZoj9r96tUyRKi5haqxMc6FXiQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 6, 2026 at 4:13 PM Shinya Kato <shinya11(dot)kato(at)gmail(dot)com> wrote:
>
> On Mon, Mar 2, 2026 at 11:44 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> > With the patch applied, I set up a logical replication and inserted a row every
> > second. Even with continuous inserts, NULL was shown in the lag columns of
> > pg_stat_replication. That makes me wonder whether the patch's approach is
> > sufficient to address the issue.
>
> Thank you for the review and testing! I had only considered the issue
> in the context of physical replication, but as you pointed out, my
> approach is insufficient for logical replication.
>
> > Relying solely on replies from the standby or subscriber seems a bit fragile to
> > me. If the goal is to keep showing the last measured lag for some time,
> > perhaps we should introduce a rate limit on when NULL is displayed in the lag
> > columns?
>
> My primary goal was to ensure that the source code comments match the
> actual behavior, as the comment stating "the second such message must
> result from wal_receiver_status_interval expiring on the standby" is
> inaccurate. However, as you noted, the patch alone is not sufficient
> to fully address the issue.
>
> > For example, if there has been no activity (i.e., sentPtr == applyPtr and
> > applyPtr has not changed since the previous cycle) for, say, 10 seconds,
> > then we could allow NULL to be shown. Thought?
>
> I considered a time-based rate limit, but it is difficult to choose an
> appropriate threshold. Furthermore, the walsender has no way of
> knowing the standby's or subscriber's wal_receiver_status_interval
> setting.
>
> The attached v2 patch takes a different approach: it additionally
> requires that all reported positions (write/flush/apply) remain
> unchanged from the previous reply. This directly detects a truly idle
> system without relying on timeouts—if any position has advanced, new
> WAL activity must have occurred, so we should not clear the lag values
> even if the lag tracker is empty.

This approach looks good to me.

One comment: currently, the lag becomes NULL basically after about one
wal_receiver_status_interval during periods of no activity. OTOH, with this
approach, it seems it would take about twice wal_receiver_status_interval.
Is this understanding correct?

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2026-03-09 11:41:04 Re: Options to control remote transactions’ access/deferrable modes in postgres_fdw
Previous Message Heikki Linnakangas 2026-03-09 11:17:40 Re: Refactor recovery conflict signaling a little