Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
Date: 2026-03-18 04:45:16
Message-ID: CAE9k0PkhdZCXrpCPRdezb0-Y_iUMgHWfUCBzNJR5reV8XaoRDg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks again for reviewing the patch. Please my my responses inline below:

On Mon, Mar 16, 2026 at 4:57 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Mon, Mar 16, 2026 at 2:33 PM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> >
> > > Will review the rest of the changes.
> > >
> >
> > Sure, thanks.
> >
>
> Please find comments for the review so far:
>
> 1)
> + Specifies which streaming replication standby server slots logical WAL
> + sender processes must wait for before delivering decoded changes.
>
> Shall we rephrase a bit for better readability:
>
> Specifies the streaming replication standby server slots that logical
> WAL sender processes must wait for before delivering decoded changes.
>

The suggested change looks good, applied in the attached patch.

> 2)
> + If a slot is missing or invalidated, it will be skipped.
>
> Should we mention 'inactive' too?
>

Yes, mentioned about all the slots that are being skipped.

> 3)
> + The keyword <literal>ANY</literal>, coupled with
> + <replaceable class="parameter">num_sync</replaceable>, specifies
> + quorum-based semantics. Logical decoding proceeds once at least
> + <replaceable class="parameter">num_sync</replaceable> of the listed
> + slots have caught up. Missing, invalidated, or lagging slots are
> + skipped when looking for the required number of caught-up slots.
> + For example, a setting of <literal>ANY 1 (sb1_slot, sb2_slot)</literal>
> + will allow logical decoding to proceed as soon as either physical slot
> + has confirmed WAL receipt. This is useful in conjunction with
> + quorum-based synchronous replication
> + (<literal>synchronous_standby_names = 'ANY ...'</literal>), so that
> + logical decoding availability matches the commit durability guarantee.
>
> IMO, it is not completely correct to say 'Missing, invalidated, or
> lagging slots are skipped'. This is because lagging slots are not
> skipped outright; they just don’t contribute toward the quorum until
> they catch up. If all slots are lagging, logical decoding still waits
> for enough slots to catch up.
>
> Should we instead say this: (or anything better you have in mind)
>
> The keyword ANY, coupled with num_sync, specifies quorum-based
> semantics. Logical decoding proceeds once at least num_sync of the
> listed slots have caught up. Missing or invalidated slots are skipped
> when determining candidates, and lagging slots simply do not count
> toward the required number until they catch up, i.e., there is no
> priority ordering. For example, a setting of ANY 1 (sb1_slot,
> sb2_slot) allows .......
>

Looks good, applied this as well in the attached patch.

> 3)
> Existing doc:
> Note that logical replication will not proceed if the slots specified
> in synchronized_standby_slots do not exist or are invalidated.
>
> Shall we change it to:
> 'if the slots' --> 'if the required number of physical slots'
>

Changed as suggested.

> 4)
>
> + <literal>FIRST</literal> and <literal>ANY</literal> are
> case-insensitive.
> + If these keywords are used as the name of a replication slot,
> + the <replaceable class="parameter">slot_name</replaceable> must
> + be double-quoted.
> + </para>
> + <para>
> + This guarantees that logical replication
> failover slots do not consume changes until those changes are received
> + and flushed to the required physical standbys. If a
>
> The sentence 'This guarantees that ...' does not appear to follow
> naturally from the preceding sentence about FIRST and ANY. The
> reference of 'This' is unclear. For better readability, should we
> replace 'This guarantees that' with 'The use of
> synchronized_standby_slots guarantees that ...'?
>

We can. Modified likewise in the attached patch.

> 5)
> slot.c:
>
> + * The layout mirrors SyncRepConfigData so that the same quorum / priority
>
> quorum / priority --> quorum/priority
>

Fixed.

> 6)
> + * This reuses the syncrep_yyparse / syncrep_scanner infrastructure that is
>
> syncrep_yyparse / syncrep_scanner --> syncrep_yyparse/syncrep_scanner
>

Fixed.

> 7)
> IsExplicitFirstSyncStandbySlotsSyntax
>
> Should we rename to: IsPrioritySyncStandbySlotsSyntax()
>
> 8)
> I would like to understand how synchronous_standby_names deal with
> such a case: 'first as server name or FIRST as priority syntax'. I
> could not find any such function/logic there.
>

Renamed as suggested.

> 9)
> IsExplicitFirstSyncStandbySlotsSyntax():
>
> + /* Explicit FIRST syntax then requires a parenthesized member list */
> + while (*p && isspace((unsigned char) *p))
> + p++;
> +
> + return (*p == '(');
>
> I could not find a case where we can reach above flow without '('. If
> we try to give without parenthese (say: first 1 slot1,slot2,slot3), it
> will be caught by syncrep_yyparse() itself. Shouldn't just checking
> FIRST + space(s)+ digit suffice? Or let me know if I am missing any
> case.
>

Without '(', control would never reach this point. I would still
retain it to ensure that when this function returns true, it has fully
validated the 'FIRST (' syntax.

> 10)
> + * To distinguish simple list from explicit "FIRST N (...)", check
> + * whether the value starts with the FIRST keyword (after whitespace).
>
> It will be good to give example:
>
> To distinguish simple list starting with 'first' such as 'firstslot,
> secondslot',...
>
> And it will be better if we move this comment inside if-block atop
> 'IsExplicitFirstSyncStandbySlotsSyntax' call.
>

Done.

> 11)
> + * Simple list (e.g., "slot1, slot2"):
> + * ALL slots must be present, valid, and caught up. Returns false
> + * immediately if any slot is missing, invalid, or lagging.
>
> We can simply say:
> ALL slots must have caught up, returns false otherwise.
>
> But if we want to mention the states too, we shall mention 'inactive'
> as well, plus 'logical' to make the information complete.
>
> For rest of the comment too:
> missing/invalid --> missing/invalid/inactive/logical
>

Fixed.

Other than these, I have also changed elevel to DEBUG1 for the message
reported for lagging slots. PFA patch with all these changes.

--
With Regards,
Ashutosh Sharma.

Attachment Content-Type Size
v20260317-0001-Add-FIRST-N-and-ANY-N-syntax-support-to-synchronized.patch application/octet-stream 46.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2026-03-18 04:53:51 Re: Return pg_control from pg_backup_stop().
Previous Message Bertrand Drouvot 2026-03-18 04:36:04 Re: Adding locks statistics