synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication

From: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
Date: 2026-02-24 22:08:37
Message-ID: CAHg+QDfU7rOebrLDESPpHSgdiadKbpCOmBokcbmM6Gr+A5VobQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

synchronized_standby_slots requires that every physical slot listed in the
GUC has caught up before a logical failover slot is allowed to proceed with
decoding. This is an ALL-of-N slots semantic. The logical slot
availability model does not align with quorum replication semantics set
using synchronous_standby_names which can be configured for quorum commit
(ANY M of N).

In a typical 3 Node HA deployment with quorum sync rep:

Primary, standby1 (corresponds to sb1_slot), standby2 (corresponds to
sb2_slot)
synchronized_standby_slots = ' sb1_slot, sb2_slot'
synchronous_standby_names = 'Any 1 ('standby1','standby2')'

If standby1 goes down, synchronous commits still succeed because standby2
satisfies the quorum. However, logical decoding blocks indefinitely in
WaitForStandbyConfirmation(), waiting for sb1_slot (corresponds to
standby1) to catch up — even though the transaction is already safely
committed on a quorum of synchronous standbys. This blocks logical decoding
consumers from progressing and is inconsistent with the availability
guarantee the DBA intended by choosing quorum commit. This scenario is
constructed in the TAP test (052_synchronized_standby_slots_quorum.pl) in
the attached patch.

*Proposal:*

Make synchronized_standby_slots quorum aware i.e. extend the GUC to accept
an ANY M (slot1, slot2, ...) syntax similar to synchronous_standby_names,
so StandbySlotsHaveCaughtup() can return true when M of N slots (where M <=
N and M >= 1) have caught up. I still prefer two different GUCs for this as
the list of slots to be synchronized can still be different (for example,
DBA may want to ensure Geo standby to be sync before allowing the logical
decoding client to read the changes). I kept synchronized_standby_slots
parse logic similar to synchronous_standby_names to keep things simple.
The default behavior is also not changed for synchronized_standby_slots.

Added a draft patch (AI assisted). Please let me know your Thoughts.

Thanks,
Satya

Attachment Content-Type Size
0001-Add-quorum-support-to-synchronized_standby_slots.patch application/octet-stream 29.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2026-02-24 22:33:34 Re: pgsql: libpq: Grease the protocol by default
Previous Message Chao Li 2026-02-24 22:03:14 Re: Fix bug of clearing of waitStart in ProcWakeup()