Re: Allow reading LSN written by walreciever, but not flushed yet

From: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allow reading LSN written by walreciever, but not flushed yet
Date: 2025-05-14 06:54:05
Message-ID: CAFh8B==0DONwvHHvW_YBN1LSi=Gd3iPc=enqCk5vNDOfDwtq2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Fujii,

On Tue, 13 May 2025 at 13:13, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
wrote:

> In this case, doesn't the flush LSN typically catch up to the write LSN on
> node2
> after a few seconds? Even if the walreceiver exits while there's still
> written
> but unflushed WAL, it looks like WalRcvDie() ensures everything is flushed
> by
> calling XLogWalRcvFlush(). So, isn't it safe to rely on the flush LSN when
> selecting
> the most advanced node? No?
>

I think it is a bit more complex than that. There are also cases when we
want to ensure that there are "healthy" standby nodes when switchover is
requested.
Meaning of "healthy" could be something like: "According to the write LSN
it is not lagging more than 16MB" or similar.
Now it is possible to extract this value using
pg_stat_get_wal_receiver()/pg_stat_wal_receiver, but it works only when the
walreceiver process is alive.

> >>> Caveat: we already have a function pg_last_wal_receive_lsn(), which in
> fact returns flushed LSN, not written. I propose to add a new function
> which returns LSN actually written. Internals of this function are already
> implemented (GetWalRcvWriteRecPtr()), but unused.
>
> GetWalRcvWriteRecPtr() returns walrcv->writtenUpto, which can move backward
> when the walreceiver restarts. This behavior is OK for your purpose?
>

IMO, most of HA tools are prepared for it. They can't rely only on
write/flush LSN, because standby may be replaying WALs from the archive
using restore_command and as a result only replay LSN is progressing.
That is, they are supposed to be doing something like max(write_lsn,
replay_lsn).

--
Regards,
--
Alexander Kukushkin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Kukushkin 2025-05-14 07:02:47 Re: Allow reading LSN written by walreciever, but not flushed yet
Previous Message Amit Kapila 2025-05-14 06:45:07 Re: Backward movement of confirmed_flush resulting in data duplication.