| From: | Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> |
|---|---|
| To: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
| Cc: | Japin Li <japinli(at)hotmail(dot)com>, surya poondla <suryapoondla4(at)gmail(dot)com>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication |
| Date: | 2026-03-26 11:52:52 |
| Message-ID: | CAE9k0P=nbNMVOJLdaghNqR9_zD=y1BhE0CstgXJghEjOn2KBCg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
Thanks for the review. Please find my comments inline below:
On Thu, Mar 26, 2026 at 12:03 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Thu, Mar 26, 2026 at 11:36 AM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> >
> > Makes sense. The attached patch addresses this too.
> >
> > --
>
> Thanks Ashutosh. I have not yet looked at today's patch, please find a
> few comments from previous one:
>
> 1)
> I noticed a change in behavior compared to the HEAD.
>
> Earlier, inactive slots were considered blocking only if they were
> lagging (restart_lsn < wait_for_lsn). Now, inactive slots are treated
> as blocking regardless of their restart_lsn. I think we should revert
> to the previous behavior. It’s possible for a slot to catch up and
> then become inactive; in such cases, it should still be treated as
> caught up rather than blocking.
>
Yes, we shouldn't be skipping the inactive slots that have already
caught up. This has been completely addressed in the attached patch.
> 2)
> + case SS_SLOT_LAGGING:
> ..
> + errdetail("The slot's restart_lsn %X/%X is behind the required %X/%X.",
> + LSN_FORMAT_ARGS(slot_states[i].restart_lsn),
> + LSN_FORMAT_ARGS(wait_for_lsn)));
>
> Here restart_lsn can even be invalid. See the caller:
>
> if (!XLogRecPtrIsValid(restart_lsn) || restart_lsn < wait_for_lsn)
> {
> slot_states[num_slot_states].state = SS_SLOT_LAGGING;
> slot_states[num_slot_states].restart_lsn = restart_lsn;
> }
>
> I think log-messages should be adjusted accordingly to handle
> invalid-restart-lsn.
>
I don't see any problem even if the restart_lsn is invalid. The log
message uses LSN_FORMAT_ARGS which should be able to log lsn with
value 0/0. However, in case of invalid restart_lsn the existing log
message may look a little less informative, so I have adjusted it
slightly which might even take care of your comment.
> 3)
> + slots have caught up. Missing, logical, invalidated, or inactive
> + slots are skipped when determining candidates, and lagging slots
> + simply do not count toward the required number until they catch up,
> + so if fewer than <replaceable class="parameter">num_sync</replaceable>
> + slots have caught up at a given moment, logical decoding waits until
> + that threshold is reached.
> + i.e., there is no priority ordering.
>
> My preference wil be to start 'If fewer than num_sync slots have
> caught up at a given moment' as a new line to break this long
> sentence, ('so' can also be removed). But I will leave the decision to
> you.
>
Okay, as you said, I have broken the sentence into two parts.
> 4)
> + For example, a setting of <literal>ANY 1 (sb1_slot, sb2_slot)</literal>
> + will allow logical decoding to proceed as soon as either physical slot
> + has confirmed WAL receipt. This is useful in conjunction with
> + quorum-based synchronous replication
> + (<literal>synchronous_standby_names = 'ANY ...'</literal>), so that
> + logical decoding availability matches the commit durability guarantee.
>
> If we read this example in continuation of the previous explanation,
> the example feels incomplete and could benefit from clarifying what
> happens if none of the slots are available or caught up. how about:
>
> For example, a setting of ANY 1 (sb1_slot, sb2_slot) allows logical
> decoding to proceed as soon as either physical slot has confirmed WAL
> receipt. If none of the slots are available or have caught up, logical
> decoding will wait until at least one slot meets the required
> condition.
>
Agreed, this does look somewhat incomplete, so changed as per your suggestions.
> 5)
> If we fix point 1, I think the doc should be reviewed to determine
> whether any sections mentioning that inactive slots are skipped need
> to be updated.
>
Yeah, updated all such references in the doc file.
PFA patch addressing all the comments above and let me know for any
further comments.
--
With Regards,
Ashutosh Sharma.
| Attachment | Content-Type | Size |
|---|---|---|
| v20260326-0001-Add-FIRST-ANY-and-N-.-syntax-support-to-synchronized.patch | application/octet-stream | 51.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jim Jones | 2026-03-26 11:53:29 | Re: VACUUM FULL, CLUSTER, and REPACK block on other sessions' temp tables |
| Previous Message | Nazir Bilal Yavuz | 2026-03-26 11:27:15 | Re: meson vs. llvm bitcode files |