| From: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
|---|---|
| To: | Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
| Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | RE: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication |
| Date: | 2026-06-04 08:24:24 |
| Message-ID: | TY4PR01MB17718104B91F2945BE727467694102@TY4PR01MB17718.jpnprd01.prod.outlook.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thursday, June 4, 2026 3:36 PM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> On Thu, Jun 4, 2026 at 9:14 AM shveta malik <shveta(dot)malik(at)gmail(dot)com>
> wrote:
> >
> > On Wed, Jun 3, 2026 at 4:30 PM Ashutosh Sharma
> <ashu(dot)coek88(at)gmail(dot)com> wrote:
> > > On Fri, May 15, 2026 at 9:28 AM shveta malik <shveta(dot)malik(at)gmail(dot)com>
> wrote:
> > > >
> > > >
> > > > Ashutosh, while testing further, I noticed that
> > > > 'synchronized_standby_slots' does not filter duplicate entries. As an
> > > > example, if user ends up giving one entry twice in priority
> > > > configuration, then we will end up waiting on one slot twice rather
> > > > than waiting on 2 different slots.
> > > >
> > > > Example:
> > > > alter system set synchronized_standby_slots = 'FIRST 2 (standby_1,
> > > > standby_1, standby_2, standby_3)';
> > > > select pg_reload_conf();
> > > > insert into tab1 values (10), (20), (30);
> > > > select pg_logical_slot_get_binary_changes('sub1', NULL, NULL,
> > > > 'proto_version', '4', 'publication_names', 'pub1');
> > > >
> > > > The last statement works even though standby_2 and standby_3 do not
> > > > exist. It consumes standby_1 twice and thinks that the required number
> > > > of slots has caught-up.
> > > >
> > > > OTOH, if we use the same configuration for
> > > > 'synchronous_standby_names', it correctly waits for standby_2 and does
> > > > not count on standby_1 twice.
> > > >
> > > > alter system set synchronous_standby_names = 'FIRST 2 (standby_1,
> > > > standby_1, standby_2, standby_3)';
> > > > insert into tab1 values (10), (20), (30); ----> This will wait on standby_2
> > > >
> > > > This is perhaps because 'synchronous_standby_names ' waits on active
> > > > WAL senders rather than repeated strings in configuration. But our
> > > > code changes wait on the names present in
> 'synchronized_standby_slots'
> > > > without filtering out duplicates.
> > > >
> > >
> > > May I know what your expectation is here? Would you like the check
> > > hook for synchronized_standby_slots to automatically resolve
> > > duplicates into a unique set of values, or should it detect duplicate
> > > entries and raise an error so that the user can correct the
> > > configuration?
> > >
> > > If we automatically resolve duplicates, the user would still see the
> > > GUC configured exactly as they specified, even though it would not
> > > function the same way internally. For example, if a user sets:
> > >
> > > FIRST 2 (s1, s1, s1, s2)
> > >
> > > it might internally be resolved to:
> > >
> > > FIRST 2 (s1, s2)
> > >
> > > However, when the user runs SHOW, it would still display the original
> > > configuration. This could give the user an incorrect impression of how
> > > the setting is actually being interpreted. Because of this, I feel we
> > > should treat duplicate entries as an invalid configuration and raise
> > > an error.
> > >
> > > As far as synchronous_standby_names is concerned, I can see that
> > > configurations such as:
> > >
> > > FIRST 2 (s1, s1, s1, s1)
> > >
> > > are currently accepted, which I don't think is correct either and
> > > should have been rejected, possibly resulted in the server startup
> > > failure.
> > >
> >
> > My preference, and original intent, was to accept duplicate entries
> > and skip them internally. Doc can be updated to say 'duplicate entries
> > are skipped'. A server startup failure due to duplicate entries in a
> > GUC does not seem right to me. If the alter-system command fails due
> > to duplicate entries, that is still fine, but a startup failure seems
> > excessive. But let's see what others have to say on this.
> >
>
> Okay, the attached patch adds the capability to automatically remove
> duplicate entries from the synchronized_standby_slots list.
Thanks for updating the patch.
I agree with Shveta that reporting an ERROR is not ideal. I also think it (ERROR) would
be inconsistent with existing GUCs, as most of them, such as
synchronous_standby_names, search_path, and session_preload_libraries, do not
enforce uniqueness.
The most similar GUC, synchronous_standby_names, also clarifies this in the
documentation:
" There is no mechanism to enforce uniqueness of standby names. In case of
duplicates one of the matching standbys will be considered as higher priority,
though exactly which one is indeterminate."[1]
> In N of M
> mode, if N > M after removing duplicate entries, an error is raised.
I'm not entirely sure about this case. It seems similar to when the number of
specified slots is less than N (in ANY N or FIRST N), given that we want to skip
duplicate slots. In that situation, the natural behavior to me would be to
simply block replication rather than raise an error. And
synchronous_standby_names would also simply block the transaction in this case.
[1] https://www.postgresql.org/docs/devel/runtime-config-replication.html#GUC-SYNCHRONOUS-STANDBY-NAMES
Best Regards,
Hou zj
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ewan Young | 2026-06-04 08:28:13 | GRAPH_TABLE: lateral reference with label disjunction fails with "plan should not reference subplan's variable" |
| Previous Message | Imran Zaheer | 2026-06-04 08:05:20 | Fix comments to reference xlogrecovery.c |