Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Ajin Cherian <itsajin(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, surya poondla <suryapoondla4(at)gmail(dot)com>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
Date: 2026-04-08 14:55:00
Message-ID: CAE9k0P=t-Yd8bq5kv6pnCq9XOCHnj9+vY1fvSKoSAubBeB9nVQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 8, 2026 at 6:23 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Wed, Apr 8, 2026 at 9:52 PM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > On Wed, Apr 8, 2026 at 7:39 AM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > If we only want to keep the slot active without advancing restart_lsn, we could
> > > start a replication connection and then acquire the slot with the help of
> > > the replication command: START_REPLICATION SLOT physical 0/01788488;
> > >
> > > E.g.,
> > >
> > > $standby->psql(
> > > 'postgres',
> > > qq[START_REPLICATION SLOT physical 0/01788488;],
> > > replication => 'database');
> > >
> >
> > I see your point. You are suggesting to use psql as a replication
> > client (instead of a standby or pg_receivewal) that doesn’t send
> > feedback to the walsender unlike walreceiver in case of standbys. In
> > that case, the slot remains active but restart_lsn doesn’t advance,
> > effectively leaving it active but lagging.
> >
> > While exploring this further, I found "019_replslot_limit.pl", which
> > uses SIGSTOP and SIGCONT to pause and resume the walsender process.
> > Pausing the walsender prevents it from streaming new WAL to the
> > standby, resulting in a slot that is active but lagging. I followed a
> > similar approach to build a test case that creates an active yet
> > lagging standby slot. This slot does not satisfy priority/quorum
> > conditions for synchronized_standby_slots, causing the logical
> > walsender to wait for standby confirmation. Once SIGCONT is sent to
> > the paused walsender, WAL streaming resumes and the logical walsender,
> > which was blocked waiting for standby confirmation, proceeds.
> >
>
> I was just trying out Hou-san's suggestion and I came up with a
> different approach. Attaching my modified test script.
> If you think it is better, feel free to use it.
>

Thanks Ajin for sharing your version based on Hou-san's suggestions.
This approach looks more robust and is also platform independent, so
in my view we should proceed with it. I’ll review the changes,
incorporate them into the main patch, and share an updated patch.

--
With Regards,
Ashutosh Sharma.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2026-04-08 15:13:14 Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Previous Message Sami Imseih 2026-04-08 14:52:39 Re: test_autovacuum/001_parallel_autovacuum is broken