| From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
|---|---|
| To: | Shinya Kato <shinya11(dot)kato(at)gmail(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: pg_stat_replication.*_lag sometimes shows NULL during active replication |
| Date: | 2026-03-09 11:21:03 |
| Message-ID: | CAHGQGwH2h_R7FWPvEs3+NWLwHZoj9r96tUyRKi5haqxMc6FXiQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Mar 6, 2026 at 4:13 PM Shinya Kato <shinya11(dot)kato(at)gmail(dot)com> wrote:
>
> On Mon, Mar 2, 2026 at 11:44 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> > With the patch applied, I set up a logical replication and inserted a row every
> > second. Even with continuous inserts, NULL was shown in the lag columns of
> > pg_stat_replication. That makes me wonder whether the patch's approach is
> > sufficient to address the issue.
>
> Thank you for the review and testing! I had only considered the issue
> in the context of physical replication, but as you pointed out, my
> approach is insufficient for logical replication.
>
> > Relying solely on replies from the standby or subscriber seems a bit fragile to
> > me. If the goal is to keep showing the last measured lag for some time,
> > perhaps we should introduce a rate limit on when NULL is displayed in the lag
> > columns?
>
> My primary goal was to ensure that the source code comments match the
> actual behavior, as the comment stating "the second such message must
> result from wal_receiver_status_interval expiring on the standby" is
> inaccurate. However, as you noted, the patch alone is not sufficient
> to fully address the issue.
>
> > For example, if there has been no activity (i.e., sentPtr == applyPtr and
> > applyPtr has not changed since the previous cycle) for, say, 10 seconds,
> > then we could allow NULL to be shown. Thought?
>
> I considered a time-based rate limit, but it is difficult to choose an
> appropriate threshold. Furthermore, the walsender has no way of
> knowing the standby's or subscriber's wal_receiver_status_interval
> setting.
>
> The attached v2 patch takes a different approach: it additionally
> requires that all reported positions (write/flush/apply) remain
> unchanged from the previous reply. This directly detects a truly idle
> system without relying on timeouts—if any position has advanced, new
> WAL activity must have occurred, so we should not clear the lag values
> even if the lag tracker is empty.
This approach looks good to me.
One comment: currently, the lag becomes NULL basically after about one
wal_receiver_status_interval during periods of no activity. OTOH, with this
approach, it seems it would take about twice wal_receiver_status_interval.
Is this understanding correct?
Regards,
--
Fujii Masao
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Etsuro Fujita | 2026-03-09 11:41:04 | Re: Options to control remote transactions’ access/deferrable modes in postgres_fdw |
| Previous Message | Heikki Linnakangas | 2026-03-09 11:17:40 | Re: Refactor recovery conflict signaling a little |