Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hou, Zhijie/侯 志杰 <houzj(dot)fnst(at)fujitsu(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
Date: 2026-06-04 07:36:09
Message-ID: CAE9k0Pkk6q72X3Rc3MUo7PxU46UcCzLfMhM02PGDUmAue9cDGg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Thu, Jun 4, 2026 at 9:14 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Wed, Jun 3, 2026 at 4:30 PM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> >
> > Hi Shveta,
> >
> > On Fri, May 15, 2026 at 9:28 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > >
> > >
> > > Ashutosh, while testing further, I noticed that
> > > 'synchronized_standby_slots' does not filter duplicate entries. As an
> > > example, if user ends up giving one entry twice in priority
> > > configuration, then we will end up waiting on one slot twice rather
> > > than waiting on 2 different slots.
> > >
> > > Example:
> > > alter system set synchronized_standby_slots = 'FIRST 2 (standby_1,
> > > standby_1, standby_2, standby_3)';
> > > select pg_reload_conf();
> > > insert into tab1 values (10), (20), (30);
> > > select pg_logical_slot_get_binary_changes('sub1', NULL, NULL,
> > > 'proto_version', '4', 'publication_names', 'pub1');
> > >
> > > The last statement works even though standby_2 and standby_3 do not
> > > exist. It consumes standby_1 twice and thinks that the required number
> > > of slots has caught-up.
> > >
> > > OTOH, if we use the same configuration for
> > > 'synchronous_standby_names', it correctly waits for standby_2 and does
> > > not count on standby_1 twice.
> > >
> > > alter system set synchronous_standby_names = 'FIRST 2 (standby_1,
> > > standby_1, standby_2, standby_3)';
> > > insert into tab1 values (10), (20), (30); ----> This will wait on standby_2
> > >
> > > This is perhaps because 'synchronous_standby_names ' waits on active
> > > WAL senders rather than repeated strings in configuration. But our
> > > code changes wait on the names present in 'synchronized_standby_slots'
> > > without filtering out duplicates.
> > >
> >
> > May I know what your expectation is here? Would you like the check
> > hook for synchronized_standby_slots to automatically resolve
> > duplicates into a unique set of values, or should it detect duplicate
> > entries and raise an error so that the user can correct the
> > configuration?
> >
> > If we automatically resolve duplicates, the user would still see the
> > GUC configured exactly as they specified, even though it would not
> > function the same way internally. For example, if a user sets:
> >
> > FIRST 2 (s1, s1, s1, s2)
> >
> > it might internally be resolved to:
> >
> > FIRST 2 (s1, s2)
> >
> > However, when the user runs SHOW, it would still display the original
> > configuration. This could give the user an incorrect impression of how
> > the setting is actually being interpreted. Because of this, I feel we
> > should treat duplicate entries as an invalid configuration and raise
> > an error.
> >
> > As far as synchronous_standby_names is concerned, I can see that
> > configurations such as:
> >
> > FIRST 2 (s1, s1, s1, s1)
> >
> > are currently accepted, which I don't think is correct either and
> > should have been rejected, possibly resulted in the server startup
> > failure.
> >
>
> My preference, and original intent, was to accept duplicate entries
> and skip them internally. Doc can be updated to say 'duplicate entries
> are skipped'. A server startup failure due to duplicate entries in a
> GUC does not seem right to me. If the alter-system command fails due
> to duplicate entries, that is still fine, but a startup failure seems
> excessive. But let's see what others have to say on this.
>

Okay, the attached patch adds the capability to automatically remove
duplicate entries from the synchronized_standby_slots list. In N of M
mode, if N > M after removing duplicate entries, an error is raised.

This behavior has been documented, and test cases verifying the change
have been added.

A few other minor comments from [1] have also been addressed. Please
have a look at the attached patches with these changes.

[1] - https://www.postgresql.org/message-id/CAJpy0uCKGCkfCXCd%3DtsDH5e85x155LsdbZW46WpWfsZJUe82bw%40mail.gmail.com

--
With Regards,
Ashutosh Sharma.

Attachment Content-Type Size
0001-Refactor-syncrep-parsing-to-represent-bare-standby-l.patch application/octet-stream 3.1 KB
0003-Add-FIRST-N-and-N-.-priority-syntax-to-synchronized_.patch application/octet-stream 22.8 KB
0002-Add-ANY-N-semantics-to-synchronized_standby_slots.patch application/octet-stream 42.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Imran Zaheer 2026-06-04 08:05:20 Fix comments to reference xlogrecovery.c
Previous Message Dilip Kumar 2026-06-04 07:25:22 Re: Proposal: Conflict log history table for Logical Replication