| From: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
|---|---|
| To: | Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> |
| Cc: | Japin Li <japinli(at)hotmail(dot)com>, surya poondla <suryapoondla4(at)gmail(dot)com>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
| Subject: | Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication |
| Date: | 2026-03-26 06:33:15 |
| Message-ID: | CAJpy0uC1-Da8gmObTfZGFmh_reEFr8Evh3PyNvb+dxdG=J_EpA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thu, Mar 26, 2026 at 11:36 AM Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
>
> Makes sense. The attached patch addresses this too.
>
> --
Thanks Ashutosh. I have not yet looked at today's patch, please find a
few comments from previous one:
1)
I noticed a change in behavior compared to the HEAD.
Earlier, inactive slots were considered blocking only if they were
lagging (restart_lsn < wait_for_lsn). Now, inactive slots are treated
as blocking regardless of their restart_lsn. I think we should revert
to the previous behavior. It’s possible for a slot to catch up and
then become inactive; in such cases, it should still be treated as
caught up rather than blocking.
2)
+ case SS_SLOT_LAGGING:
..
+ errdetail("The slot's restart_lsn %X/%X is behind the required %X/%X.",
+ LSN_FORMAT_ARGS(slot_states[i].restart_lsn),
+ LSN_FORMAT_ARGS(wait_for_lsn)));
Here restart_lsn can even be invalid. See the caller:
if (!XLogRecPtrIsValid(restart_lsn) || restart_lsn < wait_for_lsn)
{
slot_states[num_slot_states].state = SS_SLOT_LAGGING;
slot_states[num_slot_states].restart_lsn = restart_lsn;
}
I think log-messages should be adjusted accordingly to handle
invalid-restart-lsn.
3)
+ slots have caught up. Missing, logical, invalidated, or inactive
+ slots are skipped when determining candidates, and lagging slots
+ simply do not count toward the required number until they catch up,
+ so if fewer than <replaceable class="parameter">num_sync</replaceable>
+ slots have caught up at a given moment, logical decoding waits until
+ that threshold is reached.
+ i.e., there is no priority ordering.
My preference wil be to start 'If fewer than num_sync slots have
caught up at a given moment' as a new line to break this long
sentence, ('so' can also be removed). But I will leave the decision to
you.
4)
+ For example, a setting of <literal>ANY 1 (sb1_slot, sb2_slot)</literal>
+ will allow logical decoding to proceed as soon as either physical slot
+ has confirmed WAL receipt. This is useful in conjunction with
+ quorum-based synchronous replication
+ (<literal>synchronous_standby_names = 'ANY ...'</literal>), so that
+ logical decoding availability matches the commit durability guarantee.
If we read this example in continuation of the previous explanation,
the example feels incomplete and could benefit from clarifying what
happens if none of the slots are available or caught up. how about:
For example, a setting of ANY 1 (sb1_slot, sb2_slot) allows logical
decoding to proceed as soon as either physical slot has confirmed WAL
receipt. If none of the slots are available or have caught up, logical
decoding will wait until at least one slot meets the required
condition.
5)
If we fix point 1, I think the doc should be reviewed to determine
whether any sections mentioning that inactive slots are skipped need
to be updated.
~~
<I have not reviewed test yet, will review.>
thanks
Shveta
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Kapila | 2026-03-26 06:40:08 | Re: [Patch] add new parameter to pg_replication_origin_session_setup |
| Previous Message | Amit Kapila | 2026-03-26 06:10:58 | Re: Skipping schema changes in publication |