| From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
|---|---|
| To: | Shinya Kato <shinya11(dot)kato(at)gmail(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: pg_stat_replication.*_lag sometimes shows NULL during active replication |
| Date: | 2026-03-19 17:13:25 |
| Message-ID: | CAHGQGwGLUXmjC1+A1fzg-ynP1pdKC-0yfmLYcnnu4YJSEDnuQw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thu, Mar 19, 2026 at 10:58 PM Shinya Kato <shinya11(dot)kato(at)gmail(dot)com> wrote:
>
> On Tue, Mar 17, 2026 at 11:00 AM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> >
> > On Mon, Mar 16, 2026 at 9:26 AM Shinya Kato <shinya11(dot)kato(at)gmail(dot)com> wrote:
> > > Thank you for the v4 patch. I think this approach is better than mine.
> > > I tested the patch and confirmed that the issue no longer reproduces
> > > with physical replication. However, with logical replication, the lag
> > > columns in pg_stat_replication still show NULL periodically at
> > > wal_receiver_status_interval, since send_feedback() in worker.c can
> > > still send duplicate positions.
> >
> > I was thinking that if a feedback message triggered by
> > wal_receiver_status_interval has the same LSNs as the previous message,
> > it's expected for the lag columns to become NULL. But you see it differently,
> > don't you? Sorry, I failed to understand your point...
>
> Sorry for the confusion. I ran a script inserting one row every 0.5
> seconds under logical replication and confirmed that NULL still
> appears in the lag columns even while replication is actively running.
> I was initially mistaken that this was tied to
> wal_receiver_status_interval timing — that turned out to be unrelated.
>
> I haven't had time to investigate further, but my current impression
> is that the existing approach may not be sufficient for logical
> replication.
Thanks for the clarification! I understand your point now.
I think the issue occurs when the positions in the first message point to
the same LSN (e.g., 0/030D5230), and the second message reports the same but
larger LSN (e.g., 0/030D52E0).
I've updated the patch to address this. It removes fullyAppliedLastTime,
tracks the positions from the previous reply, and clears the lag values only
when the positions remain unchanged across two consecutive messages.
Patch attached. Could you test and review this updated patch?
Regards,
--
Fujii Masao
| Attachment | Content-Type | Size |
|---|---|---|
| v5-0001-Avoid-sending-duplicate-WAL-locations-in-standby-.patch | application/octet-stream | 10.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Robert Haas | 2026-03-19 17:17:04 | Re: pg_plan_advice |
| Previous Message | Tom Lane | 2026-03-19 17:08:15 | Re: [PATCH] Fix fd leak in pg_dump compression backends when dup()+fdopen() fails |