| From: | Japin Li <japinli(at)hotmail(dot)com> |
|---|---|
| To: | Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> |
| Cc: | surya poondla <suryapoondla4(at)gmail(dot)com>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication |
| Date: | 2026-03-24 04:12:19 |
| Message-ID: | SY7PR01MB10921E832261393DBE4E55C79B648A@SY7PR01MB10921.ausprd01.prod.outlook.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi, Ashutosh
On Mon, 23 Mar 2026 at 16:28, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> On Mon, Mar 23, 2026 at 9:51 AM surya poondla <suryapoondla4(at)gmail(dot)com> wrote:
>>
>> Hi All,
>>
>> Thank you for reporting a real gap and building this feature to address it. Very nice points were discussed in this thread.
>>
>> I reviewed the v20260318 patch and some comments.
>>
>> Documentation comments:
>> 1. FIRST mode does not specify what happens when valid slots < N
>> "If a slot is missing, logical, invalidated, or inactive, it will be
>> skipped. However, if a slot exists and is valid and active but has
>> not yet caught up, the system will wait for it rather than skipping
>> to lower-priority slots."
>> This paragraph explains the skip/wait distinction clearly, but
>> doesn't clearly address what happens when, after skipping all
>> missing/invalid/inactive/logical slots, the number of remaining
>> valid slots is less than num_sync?
>>
>> For example, with FIRST 2 (sb1_slot, sb2_slot, sb3_slot): if sb1_slot and sb2_slot are both invalidated and only sb3_slot is valid but lagging FIRST 2 requires two slots, but only one candidate remains.
>>
>> Looking at the code in StandbySlotsHaveCaughtup(), when syncrep_method == SYNC_REP_PRIORITY and a slot is lagging, the code does:
>> if (wait_for_all || synchronized_standby_slots_config->syncrep_method == SYNC_REP_PRIORITY)
>> break;
>>
>> So the function breaks out of the loop and returns false. This is
>> the correct behavior, but it is not stated anywhere in the
>> documentation. A user encountering this scenario will not know
>> whether to expect a wait or an error. The documentation should state
>> explicitly that in FIRST mode, if fewer valid slots than num_sync
>> are available, logical decoding waits indefinitely.
>>
>
> Agreed, have added a note about this in the documentation section for
> synchronized_standby_slots.
>
>> 2. "Missing, logical, invalidated, or inactive slots are skipped when determining candidates, and lagging slots simply do not count toward the required number until they catch up"
>> This is correct for the case where some slots are skipped and others
>> have caught up. But it does not address the case where all listed
>> slots are lagging and every slot is healthy and connected, but none
>> have reached wait_for_lsn yet. In that situation, the code records
>> each slot as SS_SLOT_LAGGING, does goto next_slot for each (because
>> syncrep_method == SYNC_REP_QUORUM), and returns false because
>> caught_up_slot_num < required. Logical decoding waits.
>>
>> You can append the following sentence to the above documentation paragraph "If fewer than num_sync slots have caught up at a given moment, logical decoding waits until that threshold is reached."
>>
>
> Agreed, have updated the documentation accordingly.
>
>>
>> Test comments:
>> 1. "PART D: Verify FIRST N priority semantics. # 3. Wait for valid but lagging slots (not skip to lower priority)"
>> The test implements this by calling $standby1->stop. A stopped standby has no active_pid, so the slot is classified as SS_SLOT_INACTIVE, not SS_SLOT_LAGGING.
>> SS_SLOT_LAGGING means it is connected and streaming but restart_lsn < wait_for_lsn.
>>
>> As Hou previously mentioned, recovery_min_apply_delay on standby1
>> would be one way to keep it connected while forcing its WAL
>> application to lag, exercising the SS_SLOT_LAGGING code path
>> directly. It is worth adding a test that covers this, both for FIRST
>> (to confirm it blocks) and for ANY (to confirm it does not).
>>
>
> I've added a test case for this, though I should flag that it may not
> be fully deterministic. I've done my best to minimize flakiness, but
> I'm not sure if that's acceptable, happy to hear thoughts on whether
> such a test is worth keeping or if it should be reworked.
>
>> + /*
>> + * Allocate array to track slot states. Size it to the total number of
>> + * configured slots since in the worst case all could have problem states.
>> + */
>> + slot_states = (SyncStandbySlotsStateInfo *)
>> + palloc(sizeof(SyncStandbySlotsStateInfo) * synchronized_standby_slots_config->nslotnames);
>>
>> I personally prefer building the list incrementally with List * and lappend
>> rather than pre-allocating, since the list may often be empty in success cases.
>
> Good thought, but I would not prefer to change it just for that reason.
>
> a) The current allocation is small and straightforward, and it is
> always freed on both paths whether success or failure.
> b) Allocating before taking the lock avoids doing memory allocation
> work while holding ReplicationSlotControlLock.
> c) Eager allocation keeps the loop single-pass and simple, lazy
> allocation adds branching complexity, and if done inside the loop it
> happens under lock.
> d) The optimization benefit is usually tiny unless nslotnames is large
> (which is very unlikely)
>
>> 3.
>>
>> +<synopsis>
>> +[FIRST] <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="parameter">slot_name</replaceable> [, ...] )
>> +ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="parameter">slot_name</replaceable> [, ...] )
>> +<replaceable class="parameter">slot_name</replaceable> [, ...]
>> +</synopsis>
>>
>> The documentation mentions that the FIRST keyword is optional, but the code
>> and comments don't seem consistent. Some comments give the impression that
>> FIRST is required:
>>
>> + * FIRST N (slot1, slot2, ...) -- wait for first N in priority order
>>
>
> Okay, I follow. Since we now support NUM (standby_list) as explicit
> priority syntax (equivalent to FIRST NUM (...)) for
> synchronized_standby_slots, this should be fine. The only divergence
> from synchronous_standby_names is the plain list form: for
> synchronized_standby_slots, a plain list means wait for all listed
> slots, and this is documented.
>
>> Additionally, IsPrioritySyncStandbySlotsSyntax() only checks for the FIRST
>> keyword and doesn't handle the "N (slot1, slot2)" case where FIRST is omitted.
>>
>
> Yeah, it doesn't, thanks for raising this concern. It has been taken
> care of in the attached patch.
>
>> 4.
>>
>> Regarding the new tests, I think we can avoid testing the plain slot list
>> since that's already covered extensively in 040_standby_failover_slots_sync.pl.
>>
>
> 040_standby_failover_slots_sync.pl mainly tests single-slot behavior,
> so it does not replace this multi-slot ALL-mode pair in 053.
>
>> Instead, we could focus on testing edge cases for ANY and FIRST. I noticed
>> there are no tests for scenarios where logical decoding blocks when using ANY or
>> FIRST. For example, testing FIRST 1 (s1, s2) where s1 is alive but hasn't caught
>> up would be valuable (if possible to test in tap-test, maybe test via
>> recovery_min_apply_delay ?). This logic is more error-prone and required more
>> careful review than other cases.
>>
>
> I have added this test-case but as mentioned in one of my earlier
> comments as well, it could be flaky, I am not quite sure if it is good
> to have any sort of test-case that is non-deterministic.
>
>> BTW, Part E seems unnecessary since the patch reuses the sync_standby_names
>> parser. ISTM, testing the first_ prefix in this context doesn't add valuable
>> coverage.
>
> Unlike sync_standby_names, where a plain list is treated as FIRST 1
> mode, synchronized_standby_slots treats it as ALL mode. To handle this
> difference, the code checks whether FIRST ( was explicitly specified
> and, if so, adjusts the parser output to ensure it is treated as ALL
> mode instead. This test specifically validates that behavior.
>
>
> On Fri, Mar 20, 2026 at 4:58 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>>
>>
>> I was reviewing the latest patch and here are few comments/suggestions
>>
>> 1.
>> + <para>
>> + The keyword <literal>FIRST</literal>, coupled with
>> + <replaceable class="parameter">num_sync</replaceable>, specifies
>> + priority-based semantics. Logical decoding will wait for the first
>> + <replaceable class="parameter">num_sync</replaceable> available
>> + physical slots in priority order (the order they appear in the list).
>> + If a slot is missing, logical, invalidated, or inactive, it will be
>> + skipped. However, if a slot exists and is valid and active but has
>> + not yet caught up, the system will wait for it rather than skipping
>> + to lower-priority slots.
>> + </para>
>>
>> This explains that it will skip missing, logical, invalidated, or
>> inactive slots while processing the list and deciding on which first N
>> slots to wait for, but I could not understand what would be the
>> behavior if after skipping a few invalid slots the remaining valid
>> slots are less than N, will it proceed for decoding? I think this
>> should be clarified in docs.
>>
>
> This has been taken care of in the attached patch.
>
>> 2. I see a lot of overlapping code between
>> check_synchronized_standby_slots() and
>> check_synchronous_standby_names(), can we do some refactoring to make
>> a common code?
>>
>
> I looked into both functions and found about ~20-25% of the code is
> shared. Given the relatively small overlap, I'm not sure it's worth
> refactoring, happy to do it if the consensus is that it improves
> maintainability.
>
>> I have just pass through the patch and will try to complete the review soon.
>
> Sure, thanks.
>
> PFA patch that addresses all the comments above.
Thanks for updating the patch. A minor nitpick:
1.
+typedef enum
+{
+ SS_SLOT_NOT_FOUND, /* slot does not exist */
+ SS_SLOT_LOGICAL, /* slot is logical, not physical */
+ SS_SLOT_INVALIDATED, /* slot has been invalidated */
+ SS_SLOT_INACTIVE, /* slot is inactive (standby not connected) */
+ SS_SLOT_LAGGING /* slot exists and is active but has not caught up */
+} SyncStandbySlotsState;
IIRC, trailing commas are now used after the last enum.
2.
+ slot_states = (SyncStandbySlotsStateInfo *)
+ palloc(sizeof(SyncStandbySlotsStateInfo) * synchronized_standby_slots_config->nslotnames);
With palloc_array() now available, it would be preferable.
>
> --
> With Regards,
> Ashutosh Sharma.
>
> [2. text/x-diff; v20260323-0001-Add-FIRST-N-and-ANY-N-syntax-support-to-synchronized.patch]...
--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | zengman | 2026-03-24 04:13:20 | Re:rewriteGraphTable: Fix missing RTEs in FROM clause by setting inFromCl=true |
| Previous Message | Nisha Moond | 2026-03-24 04:01:28 | Re: Use SIGTERM instead of SIGUSR1 for slotsync worker to exit during promotion? |