| From: | Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> |
|---|---|
| To: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
| Cc: | Ajin Cherian <itsajin(at)gmail(dot)com>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication |
| Date: | 2026-03-18 04:45:16 |
| Message-ID: | CAE9k0PkhdZCXrpCPRdezb0-Y_iUMgHWfUCBzNJR5reV8XaoRDg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
Thanks again for reviewing the patch. Please my my responses inline below:
On Mon, Mar 16, 2026 at 4:57 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Mon, Mar 16, 2026 at 2:33 PM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> >
> > > Will review the rest of the changes.
> > >
> >
> > Sure, thanks.
> >
>
> Please find comments for the review so far:
>
> 1)
> + Specifies which streaming replication standby server slots logical WAL
> + sender processes must wait for before delivering decoded changes.
>
> Shall we rephrase a bit for better readability:
>
> Specifies the streaming replication standby server slots that logical
> WAL sender processes must wait for before delivering decoded changes.
>
The suggested change looks good, applied in the attached patch.
> 2)
> + If a slot is missing or invalidated, it will be skipped.
>
> Should we mention 'inactive' too?
>
Yes, mentioned about all the slots that are being skipped.
> 3)
> + The keyword <literal>ANY</literal>, coupled with
> + <replaceable class="parameter">num_sync</replaceable>, specifies
> + quorum-based semantics. Logical decoding proceeds once at least
> + <replaceable class="parameter">num_sync</replaceable> of the listed
> + slots have caught up. Missing, invalidated, or lagging slots are
> + skipped when looking for the required number of caught-up slots.
> + For example, a setting of <literal>ANY 1 (sb1_slot, sb2_slot)</literal>
> + will allow logical decoding to proceed as soon as either physical slot
> + has confirmed WAL receipt. This is useful in conjunction with
> + quorum-based synchronous replication
> + (<literal>synchronous_standby_names = 'ANY ...'</literal>), so that
> + logical decoding availability matches the commit durability guarantee.
>
> IMO, it is not completely correct to say 'Missing, invalidated, or
> lagging slots are skipped'. This is because lagging slots are not
> skipped outright; they just don’t contribute toward the quorum until
> they catch up. If all slots are lagging, logical decoding still waits
> for enough slots to catch up.
>
> Should we instead say this: (or anything better you have in mind)
>
> The keyword ANY, coupled with num_sync, specifies quorum-based
> semantics. Logical decoding proceeds once at least num_sync of the
> listed slots have caught up. Missing or invalidated slots are skipped
> when determining candidates, and lagging slots simply do not count
> toward the required number until they catch up, i.e., there is no
> priority ordering. For example, a setting of ANY 1 (sb1_slot,
> sb2_slot) allows .......
>
Looks good, applied this as well in the attached patch.
> 3)
> Existing doc:
> Note that logical replication will not proceed if the slots specified
> in synchronized_standby_slots do not exist or are invalidated.
>
> Shall we change it to:
> 'if the slots' --> 'if the required number of physical slots'
>
Changed as suggested.
> 4)
>
> + <literal>FIRST</literal> and <literal>ANY</literal> are
> case-insensitive.
> + If these keywords are used as the name of a replication slot,
> + the <replaceable class="parameter">slot_name</replaceable> must
> + be double-quoted.
> + </para>
> + <para>
> + This guarantees that logical replication
> failover slots do not consume changes until those changes are received
> + and flushed to the required physical standbys. If a
>
> The sentence 'This guarantees that ...' does not appear to follow
> naturally from the preceding sentence about FIRST and ANY. The
> reference of 'This' is unclear. For better readability, should we
> replace 'This guarantees that' with 'The use of
> synchronized_standby_slots guarantees that ...'?
>
We can. Modified likewise in the attached patch.
> 5)
> slot.c:
>
> + * The layout mirrors SyncRepConfigData so that the same quorum / priority
>
> quorum / priority --> quorum/priority
>
Fixed.
> 6)
> + * This reuses the syncrep_yyparse / syncrep_scanner infrastructure that is
>
> syncrep_yyparse / syncrep_scanner --> syncrep_yyparse/syncrep_scanner
>
Fixed.
> 7)
> IsExplicitFirstSyncStandbySlotsSyntax
>
> Should we rename to: IsPrioritySyncStandbySlotsSyntax()
>
> 8)
> I would like to understand how synchronous_standby_names deal with
> such a case: 'first as server name or FIRST as priority syntax'. I
> could not find any such function/logic there.
>
Renamed as suggested.
> 9)
> IsExplicitFirstSyncStandbySlotsSyntax():
>
> + /* Explicit FIRST syntax then requires a parenthesized member list */
> + while (*p && isspace((unsigned char) *p))
> + p++;
> +
> + return (*p == '(');
>
> I could not find a case where we can reach above flow without '('. If
> we try to give without parenthese (say: first 1 slot1,slot2,slot3), it
> will be caught by syncrep_yyparse() itself. Shouldn't just checking
> FIRST + space(s)+ digit suffice? Or let me know if I am missing any
> case.
>
Without '(', control would never reach this point. I would still
retain it to ensure that when this function returns true, it has fully
validated the 'FIRST (' syntax.
> 10)
> + * To distinguish simple list from explicit "FIRST N (...)", check
> + * whether the value starts with the FIRST keyword (after whitespace).
>
> It will be good to give example:
>
> To distinguish simple list starting with 'first' such as 'firstslot,
> secondslot',...
>
> And it will be better if we move this comment inside if-block atop
> 'IsExplicitFirstSyncStandbySlotsSyntax' call.
>
Done.
> 11)
> + * Simple list (e.g., "slot1, slot2"):
> + * ALL slots must be present, valid, and caught up. Returns false
> + * immediately if any slot is missing, invalid, or lagging.
>
> We can simply say:
> ALL slots must have caught up, returns false otherwise.
>
> But if we want to mention the states too, we shall mention 'inactive'
> as well, plus 'logical' to make the information complete.
>
> For rest of the comment too:
> missing/invalid --> missing/invalid/inactive/logical
>
Fixed.
Other than these, I have also changed elevel to DEBUG1 for the message
reported for lagging slots. PFA patch with all these changes.
--
With Regards,
Ashutosh Sharma.
| Attachment | Content-Type | Size |
|---|---|---|
| v20260317-0001-Add-FIRST-N-and-ANY-N-syntax-support-to-synchronized.patch | application/octet-stream | 46.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2026-03-18 04:53:51 | Re: Return pg_control from pg_backup_stop(). |
| Previous Message | Bertrand Drouvot | 2026-03-18 04:36:04 | Re: Adding locks statistics |