Quick Links

Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication

From:	Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To:	shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc:	Japin Li <japinli(at)hotmail(dot)com>, surya poondla <suryapoondla4(at)gmail(dot)com>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
Date:	2026-03-26 11:52:52
Message-ID:	CAE9k0P=nbNMVOJLdaghNqR9_zD=y1BhE0CstgXJghEjOn2KBCg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

Thanks for the review. Please find my comments inline below:

On Thu, Mar 26, 2026 at 12:03 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Thu, Mar 26, 2026 at 11:36 AM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> >
> > Makes sense. The attached patch addresses this too.
> >
> > --
>
> Thanks Ashutosh. I have not yet looked at today's patch, please find a
> few comments from previous one:
>
> 1)
> I noticed a change in behavior compared to the HEAD.
>
> Earlier, inactive slots were considered blocking only if they were
> lagging (restart_lsn < wait_for_lsn). Now, inactive slots are treated
> as blocking regardless of their restart_lsn. I think we should revert
> to the previous behavior. It’s possible for a slot to catch up and
> then become inactive; in such cases, it should still be treated as
> caught up rather than blocking.
>

Yes, we shouldn't be skipping the inactive slots that have already
caught up. This has been completely addressed in the attached patch.

> 2)
> + case SS_SLOT_LAGGING:
> ..
> + errdetail("The slot's restart_lsn %X/%X is behind the required %X/%X.",
> + LSN_FORMAT_ARGS(slot_states[i].restart_lsn),
> + LSN_FORMAT_ARGS(wait_for_lsn)));
>
> Here restart_lsn can even be invalid. See the caller:
>
> if (!XLogRecPtrIsValid(restart_lsn) || restart_lsn < wait_for_lsn)
> {
> slot_states[num_slot_states].state = SS_SLOT_LAGGING;
> slot_states[num_slot_states].restart_lsn = restart_lsn;
> }
>
> I think log-messages should be adjusted accordingly to handle
> invalid-restart-lsn.
>

I don't see any problem even if the restart_lsn is invalid. The log
message uses LSN_FORMAT_ARGS which should be able to log lsn with
value 0/0. However, in case of invalid restart_lsn the existing log
message may look a little less informative, so I have adjusted it
slightly which might even take care of your comment.

> 3)
> + slots have caught up. Missing, logical, invalidated, or inactive
> + slots are skipped when determining candidates, and lagging slots
> + simply do not count toward the required number until they catch up,
> + so if fewer than <replaceable class="parameter">num_sync</replaceable>
> + slots have caught up at a given moment, logical decoding waits until
> + that threshold is reached.
> + i.e., there is no priority ordering.
>
> My preference wil be to start 'If fewer than num_sync slots have
> caught up at a given moment' as a new line to break this long
> sentence, ('so' can also be removed). But I will leave the decision to
> you.
>

Okay, as you said, I have broken the sentence into two parts.

> 4)
> + For example, a setting of <literal>ANY 1 (sb1_slot, sb2_slot)</literal>
> + will allow logical decoding to proceed as soon as either physical slot
> + has confirmed WAL receipt. This is useful in conjunction with
> + quorum-based synchronous replication
> + (<literal>synchronous_standby_names = 'ANY ...'</literal>), so that
> + logical decoding availability matches the commit durability guarantee.
>
> If we read this example in continuation of the previous explanation,
> the example feels incomplete and could benefit from clarifying what
> happens if none of the slots are available or caught up. how about:
>
> For example, a setting of ANY 1 (sb1_slot, sb2_slot) allows logical
> decoding to proceed as soon as either physical slot has confirmed WAL
> receipt. If none of the slots are available or have caught up, logical
> decoding will wait until at least one slot meets the required
> condition.
>

Agreed, this does look somewhat incomplete, so changed as per your suggestions.

> 5)
> If we fix point 1, I think the doc should be reviewed to determine
> whether any sections mentioning that inactive slots are skipped need
> to be updated.
>

Yeah, updated all such references in the doc file.

PFA patch addressing all the comments above and let me know for any
further comments.

--
With Regards,
Ashutosh Sharma.

Attachment	Content-Type	Size
v20260326-0001-Add-FIRST-ANY-and-N-.-syntax-support-to-synchronized.patch	application/octet-stream	51.6 KB

In response to

Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication at 2026-03-26 06:33:15 from shveta malik

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jim Jones	2026-03-26 11:53:29	Re: VACUUM FULL, CLUSTER, and REPACK block on other sessions' temp tables
Previous Message	Nazir Bilal Yavuz	2026-03-26 11:27:15	Re: meson vs. llvm bitcode files