Quick Links

Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication

From:	Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To:	shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc:	Japin Li <japinli(at)hotmail(dot)com>, surya poondla <suryapoondla4(at)gmail(dot)com>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
Date:	2026-04-07 10:26:23
Message-ID:	CAE9k0P=9Kbk1-FB6ugg8a7nLxHkN1SFQbbRe8-tMjZwYHrXadw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On Tue, Apr 7, 2026 at 11:20 AM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Tue, Apr 7, 2026 at 9:04 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> >
> > I see your point. I agree that using wal_receiver_status_interval for
> > this test may not be a reliable way. Can we attempt using
> > pg_wal_replay_pause() on standby and then checking
> > wait_event=WaitForStandbyConfirmation with backend_type=walsender on
> > primary? Or do you see any issues in this approach that I might be
> > overlooking?
> >
>
> Yes, I think we can make use of the WAL replay pause/resume mechanism.
> This seems like the right approach, as it gives us a more controlled
> and deterministic way to validate the lagging behavior.
>

Looking at 049_wait_for_lsn.pl (the test case you referenced), it
explicitly stops the WAL receiver by setting primary_conninfo to an
empty string, rather than just pausing WAL replay. Using
pg_wal_replay_pause() alone only halts replay; the WAL receiver
continues running, keeps receiving WAL, and sends feedback/status to
the primary. That feedback is sufficient to advance restart_lsn on the
standby’s slot, which would violate the restart_lsn < wait_for_lsn
condition inside StandbySlotsHaveCaughtup(), which is not what we
want.

This leads to the question: can we construct a realistic test case
where a failover standby remains active (WAL receiver running) while
its restart_lsn is still genuinely lagging and consistently so? This
likely needs further exploration.

--
With Regards,
Ashutosh Sharma.

In response to

Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication at 2026-04-07 05:50:41 from Ashutosh Sharma

Responses

Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication at 2026-04-07 11:48:13 from shveta malik

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2026-04-07 10:30:07	Re: Adding REPACK [concurrently]
Previous Message	Etsuro Fujita	2026-04-07 10:25:41	Re: Asynchronous MergeAppend